Affects Version/s: unspecified
Fix Version/s: None
Operating System: All
External issue ID:8794
The Robot failure  is the usual 120s timeout we can see caused by multiple bugs (from transaction writer for module-based shard, tell-based protocol):
RequestTimeoutException: Timed out after 120.029805238seconds
Looking at karaf log  of member-1 (writer, old leader), we can see leadership has been successfully transferred at 04:04:59,250 but the information about the new leader being there has been lost:
2017-07-04 04:04:59,252 | INFO | lt-dispatcher-42 | LocalActorRef | 174 - com.typesafe.akka.slf4j - 2.4.18 | Message [org.opendaylight.controller.cluster.datastore.messages.ShardLeaderStateChanged] from Actorakka://opendaylight-cluster-data/user/shardmanager-config/member-1-shard-default-config#145361760 to Actorakka://opendaylight-cluster-data/user/shardmanager-config/member-1-shard-default-config/member-1-shard-default-config-notifier#-591265397 was not delivered.  dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
So the member new it is a Follower, but it was unable to tell client who the new leader is.
2017-07-04 04:05:19,248 | WARN | monPool-worker-2 | AbstractShardBackendResolver | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.Carbon | Failed to resolve shard
java.util.concurrent.TimeoutException: Shard has no current leader
Perhaps there is a common underlying Bug which causes occasional undelivered messages, and we see different symptoms depending on which message gets lost.