Prechádzať zdrojové kódy

If failed_to_recv is set, consensus can be empty

If failed_to_recv is set (node detect itself not able to receive
message), we can end up with assert, because my_failed_list and
my_member_list are same list. This is happening because we are not
following specification and we allow to mark node itself as failed.
Because if failed_to_recv is set and we reached consensus across nodes,
single node membership is created (ignoring both fail list and
member_list), we can skip assert.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
(cherry picked from commit d4db2ea5353c8eedb64a88ae413c04e0757378c9)
Jan Friesse 13 rokov pred
rodič
commit
1c17696e6f
1 zmenil súbory, kde vykonal 15 pridanie a 0 odobranie
  1. 15 0
      exec/totemsrp.c

+ 15 - 0
exec/totemsrp.c

@@ -1247,6 +1247,16 @@ static int memb_consensus_agreed (
 			break;
 			break;
 		}
 		}
 	}
 	}
+
+	if (agreed && instance->failed_to_recv == 1) {
+		/*
+		 * Both nodes agreed on our failure. We don't care how many proc list items left because we
+		 * will create single ring anyway.
+		 */
+
+		 return (agreed);
+	}
+
 	assert (token_memb_entries >= 1);
 	assert (token_memb_entries >= 1);
 
 
 	return (agreed);
 	return (agreed);
@@ -3625,6 +3635,11 @@ printf ("token seq %d\n", token->seq);
 			instance->my_aru_count = 0;
 			instance->my_aru_count = 0;
 		}
 		}
 
 
+		/*
+		 * We really don't follow specification there. In specification, OTHER nodes
+		 * detect failure of one node (based on aru_count) and my_id IS NEVER added
+		 * to failed list (so node never mark itself as failed)
+		 */
 		if (instance->my_aru_count > instance->totem_config->fail_to_recv_const &&
 		if (instance->my_aru_count > instance->totem_config->fail_to_recv_const &&
 			token->aru_addr == instance->my_id.addr[0].nodeid) {
 			token->aru_addr == instance->my_id.addr[0].nodeid) {