Details
-
Bug
-
Status: Resolved
-
Resolution: Done
-
Lithium
-
None
-
None
-
Operating System: All
Platform: All
-
4563
Description
Recently Gary Wu was intermittently seeing a cluster island formed when the first seed node was re-started with the other members up. I also recently saw this in a production environment.
According to akka's docs, on startup the first seed node tries every sec to contact the other seed nodes up to the seed-node-timeout. If none connect then it joins itself and declares itself a leader. The default seed-node-timeut is 5 sec. However this may not be enough in some cases. Akka has a mechanism to gate a node for 5 sec before allowing re-connect (it logs info messages about this). I think if the timing is right, the 5 sec gate could result in the seed node timeout.
Gary increased the seed-node-timeout and that alleviated the island issue. I think we should increase it by default to 15 sec to be safe.