瀏覽代碼

A CPG client can sometimes lockup if the local node is in the downlist

In a 10-node cluster where all nodes are booting up and starting corosync
at the same time, sometimes during this process corosync detects a node as
leaving and rejoining the cluster.

Occasionally the downlist that gets picked contains the local node. When the
local node sends leave events for the downlist (including itself), it sets
its cpd state to CPD_STATE_UNJOINED and clears the cpd->group_name. This
means it no longer sends CPG events to the CPG client.

Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Tim Beale 14 年之前
父節點
當前提交
08f07be323
共有 1 個文件被更改,包括 2 次插入1 次删除
  1. 2 1
      services/cpg.c

+ 2 - 1
services/cpg.c

@@ -720,7 +720,8 @@ static int notify_lib_joinlist(
 				}
 				if (left_list_entries) {
 					if (left_list[0].pid == cpd->pid &&
-						left_list[0].nodeid == api->totem_nodeid_get()) {
+						left_list[0].nodeid == api->totem_nodeid_get() &&
+						left_list[0].reason == CONFCHG_CPG_REASON_LEAVE) {
 
 						cpd->pid = 0;
 						memset (&cpd->group_name, 0, sizeof(cpd->group_name));