소스 검색

A CPG client can sometimes lockup if the local node is in the downlist

In a 10-node cluster where all nodes are booting up and starting corosync
at the same time, sometimes during this process corosync detects a node as
leaving and rejoining the cluster.

Occasionally the downlist that gets picked contains the local node. When the
local node sends leave events for the downlist (including itself), it sets
its cpd state to CPD_STATE_UNJOINED and clears the cpd->group_name. This
means it no longer sends CPG events to the CPG client.

Reviewed-by: Jan Friesse <jfriesse@redhat.com>
(cherry picked from commit 08f07be323b777118264eb37413393065b360f8e)
Tim Beale 14 년 전
부모
커밋
ac1d79ea7c
1개의 변경된 파일2개의 추가작업 그리고 1개의 파일을 삭제
  1. 2 1
      services/cpg.c

+ 2 - 1
services/cpg.c

@@ -683,7 +683,8 @@ static int notify_lib_joinlist(
 				}
 				if (left_list_entries) {
 					if (left_list[0].pid == cpd->pid &&
-						left_list[0].nodeid == api->totem_nodeid_get()) {
+						left_list[0].nodeid == api->totem_nodeid_get() &&
+						left_list[0].reason == CONFCHG_CPG_REASON_LEAVE) {
 
 						cpd->pid = 0;
 						memset (&cpd->group_name, 0, sizeof(cpd->group_name));