소스 검색

A CPG client can sometimes lockup if the local node is in the downlist

In a 10-node cluster where all nodes are booting up and starting corosync
at the same time, sometimes during this process corosync detects a node as
leaving and rejoining the cluster.

Occasionally the downlist that gets picked contains the local node. When the
local node sends leave events for the downlist (including itself), it sets
its cpd state to CPD_STATE_UNJOINED and clears the cpd->group_name. This
means it no longer sends CPG events to the CPG client.

Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Tim Beale 14 년 전
부모
커밋
08f07be323
1개의 변경된 파일2개의 추가작업 그리고 1개의 파일을 삭제
  1. 2 1
      services/cpg.c

+ 2 - 1
services/cpg.c

@@ -720,7 +720,8 @@ static int notify_lib_joinlist(
 				}
 				if (left_list_entries) {
 					if (left_list[0].pid == cpd->pid &&
-						left_list[0].nodeid == api->totem_nodeid_get()) {
+						left_list[0].nodeid == api->totem_nodeid_get() &&
+						left_list[0].reason == CONFCHG_CPG_REASON_LEAVE) {
 
 						cpd->pid = 0;
 						memset (&cpd->group_name, 0, sizeof(cpd->group_name));