Browse Source

totem: Drop invalid join msg in operational state

According to the totem paper, if a processor
receives a join message in the operational state and if the
receivers identifier is in the join messages fail list,
then join message should be ignored.

By applying this validation of join messages, we can avoid unnecessary
switching from operational state to gather state(or even lead to rings
can not be merged) like the following to happen.

1. Initially, there is only one ring contains three nodes, say
   ring(A,B,C).
2. A and B network partition, "in the same time", C is down.
3. Node A sends join message with proclist:A,B,C. faillist:NULL.
   Node B sends join message with proclist:A,B,C. faillist:NULL.
4. Both A and B consensus timeout due to network partition.
5. A and B network remerged.
6. Node A sends join message with proclist:A,B,C. faillist:B,C. and
   create ring(A).
   Node B sends join message with proclist:A,B,C. faillist:A,C. and
   create ring(B).
7. Say join message with proclist:A,B,C. faillist:A,C which sent
   by node B is received by node A because network remerged.
8. Node A shifts to gather state and send out a modified join message
   with proclist:A,B,C. faillist:B. Such join message will prevent
   both A and B from merging.
9. Node A consensus timeout (caused by waiting node C) and sends join
   message with proclist:A,B,C. faillist:B,C again.

Same thing happens on node B, so A and B will dead loop forever
in step 7, 8 and 9.

As the paper also said: "If a processor receives a join message in the
operational state and if the sender's identifier is in the receiver's
my_proclist and the join message's ring_seq is less than the receiver's
ring sequence number, then it ignores the join message too." So these
patch applying these validations of join messages altogether.

Signed-off-by: Jason <huzhijiang@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Jason 12 years ago
parent
commit
cfbb021e13
1 changed files with 33 additions and 1 deletions
  1. 33 1
      exec/totemsrp.c

+ 33 - 1
exec/totemsrp.c

@@ -4400,6 +4400,36 @@ static void memb_merge_detect_endian_convert (
 	srp_addr_copy_endian_convert (&out->system_from, &in->system_from);
 }
 
+static int ignore_join_under_operational (
+	struct totemsrp_instance *instance,
+	const struct memb_join *memb_join)
+{
+	struct srp_addr *proc_list;
+	struct srp_addr *failed_list;
+	unsigned long long ring_seq;
+
+	proc_list = (struct srp_addr *)memb_join->end_of_memb_join;
+	failed_list = proc_list + memb_join->proc_list_entries;
+	ring_seq = memb_join->ring_seq;
+
+	if (memb_set_subset (&instance->my_id, 1,
+	    failed_list, memb_join->failed_list_entries)) {
+		return (1);
+	}
+
+	/*
+	 * In operational state, my_proc_list is exactly the same as
+	 * my_memb_list.
+	 */
+	if ((memb_set_subset (&memb_join->system_from, 1,
+	    instance->my_memb_list, instance->my_memb_entries)) &&
+	    (ring_seq < instance->my_ring_id.seq)) {
+		return (1);
+	}
+
+	return (0);
+}
+
 static int message_handler_memb_join (
 	struct totemsrp_instance *instance,
 	const void *msg,
@@ -4430,7 +4460,9 @@ static int message_handler_memb_join (
 	}
 	switch (instance->memb_state) {
 		case MEMB_STATE_OPERATIONAL:
-			memb_join_process (instance, memb_join);
+			if (!ignore_join_under_operational (instance, memb_join)) {
+				memb_join_process (instance, memb_join);
+			}
 			break;
 
 		case MEMB_STATE_GATHER: