Details
-
Bug
-
Status: Resolved
-
Resolution: Duplicate
-
None
-
None
-
None
-
Operating System: All
Platform: All
-
8513
Description
I had most of this written when I realized member-1 received UnreachableMember just around the time shard replica removal was happening, so at the end this might be just another symptom of general CONTROLLER-1645 behavior.
Opening this anyway, just to have this particular symptom documented.
Scenario: Prefix-based shard (thus tell-based protocol) is created, transaction producer active on the leader member. Then the replica on that member is removed, transaction producer is expected to continue on when one of the other members becomes the new leader.
Robot sees the producer fail [0].
The message corresponds to what is in karaf.log [1]:
2017-05-19 05:13:04,133 | ERROR | ult-dispatcher-2 | ClientActorBehavior | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.Carbon | member-3-frontend-datastore-Shard-id-ints!: failed to resolve shard 0
org.opendaylight.controller.cluster.access.commands.NotLeaderException: Actor Actorakka://opendaylight-cluster-data/user/shardmanager-config/member-3-shard-id-ints!-config#300839828 is not the current leader
at org.opendaylight.controller.cluster.datastore.Shard.handleConnectClient(Shard.java:435)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.Carbon]
at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:305)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.Carbon]
at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:270)[193:org.opendaylight.controller.sal-akka-raft:1.5.0.Carbon]
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:31)[192:org.opendaylight.controller.sal-clustering-commons:1.5.0.Carbon]
at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:170)[180:com.typesafe.akka.persistence:2.4.17]
at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:104)[192:org.opendaylight.controller.sal-clustering-commons:1.5.0.Carbon]
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.Actor$class.aroundReceive(Actor.scala:497)[173:com.typesafe.akka.actor:2.4.17]
at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:168)[180:com.typesafe.akka.persistence:2.4.17]
at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:664)[180:com.typesafe.akka.persistence:2.4.17]
at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:183)[180:com.typesafe.akka.persistence:2.4.17]
at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:168)[180:com.typesafe.akka.persistence:2.4.17]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.ActorCell.invoke(ActorCell.scala:495)[173:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)[173:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.run(Mailbox.scala:224)[173:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[173:com.typesafe.akka.actor:2.4.17]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
In this case member-3 was the original leader and the (expected) message at 05:13:04,063 shows the client detected member-3 is no longer the leader.
The client should attempt to contact the new leader, but as I mentioned at the start, the new leader might be unreachable at the time.
This fix [2] should make this situation less likely anyway.
[0] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/697/archives/log.html.gz#s1-s22-t1-k2-k10
[1] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/697/archives/odl3_karaf.log.gz
[2] https://git.opendaylight.org/gerrit/57074
Attachments
Issue Links
- duplicates
-
CONTROLLER-1689 stopping resolution of shard 0 on stale connection: AskTimeoutException
- Resolved