[CONTROLLER-1692] C: produce-transactions rpc returned NotLeaderException when local prefix shard replica removed Created: 19/May/17  Updated: 25/Jul/23  Resolved: 02/Jun/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Peter Gubka Assignee: Robert Varga
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 8515

 Description   

scenario: clean leader shutdown

produce-transactions is started on the prefix shard leader node. Then shard replica is removed on this node and produce-transactions returns
java.util.concurrent.ExecutionException: org.opendaylight.controller.cluster.access.commands.NotLeaderException: Actor Actorakka://opendaylight-cluster-data/user/shardmanager-config/member-2-shard-id-ints!-config#2021325772 is not the current leader\n\tat com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)\n\tat com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:272)\n\tat

https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/698/archives/log.html.gz#s1-s22-t1-k2-k11-k1

The node logs https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/698/archives/odl2_karaf.log.gz
017-05-19 08:18:58,304 | INFO | ult-dispatcher-3 | ClientActorBehavior | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.Carbon | member-2-frontend-datastore-Shard-id-ints!: connection ConnectedClientConnection{client=ClientIdentifier

{frontend=member-2-frontend-datastore-Shard-id-ints!, generation=0}

, cookie=0, backend=ShardBackendInfo{actor=Actorakka://opendaylight-cluster-data/user/shardmanager-config/member-2-shard-id-ints!-config#2021325772, sessionId=0, version=BORON, maxMessages=1000, cookie=0, shard=id-ints!, dataTree present=true}} indicated no leadership, reconnecting it
org.opendaylight.controller.cluster.access.commands.NotLeaderException: Actor Actorakka://opendaylight-cluster-data/user/shardmanager-config/member-2-shard-id-ints!-config#2021325772 is not the current leader
at org.opendaylight.controller.cluster.datastore.Shard.handleRequest(Shard.java:454)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.Carbon]
at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:292)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.Carbon]
at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:270)[193:org.opendaylight.controller.sal-akka-raft:1.5.0.Carbon]
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:31)[192:org.opendaylight.controller.sal-clustering-commons:1.5.0.Carbon]
at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:170)[180:com.typesafe.akka.persistence:2.4.17]
at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:104)[192:org.opendaylight.controller.sal-clustering-commons:1.5.0.Carbon]
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.Actor$class.aroundReceive(Actor.scala:497)[173:com.typesafe.akka.actor:2.4.17]
at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:168)[180:com.typesafe.akka.persistence:2.4.17]
at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:664)[180:com.typesafe.akka.persistence:2.4.17]
at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:183)[180:com.typesafe.akka.persistence:2.4.17]
at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:168)[180:com.typesafe.akka.persistence:2.4.17]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.ActorCell.invoke(ActorCell.scala:495)[173:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)[173:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.run(Mailbox.scala:224)[173:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[173:com.typesafe.akka.actor:2.4.17]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]



 Comments   
Comment by Vratko Polak [ 26/May/17 ]

This happened again, even for module-based shards (tell-based protocol, [0] codebase, so only on Sandbox [1]).

Karaf.log [2] segment:
2017-05-25 21:31:28,752 | WARN | monPool-worker-2 | AbstractShardBackendResolver | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Failed to resolve shard
org.opendaylight.controller.cluster.access.commands.NotLeaderException: Actor Actorakka://opendaylight-cluster-data/user/shardmanager-config/member-1-shard-default-config#-199254016 is not the current leader
at org.opendaylight.controller.cluster.datastore.Shard.handleConnectClient(Shard.java:435)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:305)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:270)[193:org.opendaylight.controller.sal-akka-raft:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:31)[192:org.opendaylight.controller.sal-clustering-commons:1.5.0.SNAPSHOT]
at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:170)[180:com.typesafe.akka.persistence:2.4.17]
at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:104)[192:org.opendaylight.controller.sal-clustering-commons:1.5.0.SNAPSHOT]
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.Actor$class.aroundReceive(Actor.scala:497)[173:com.typesafe.akka.actor:2.4.17]
at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:168)[180:com.typesafe.akka.persistence:2.4.17]
at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:664)[180:com.typesafe.akka.persistence:2.4.17]
at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:183)[180:com.typesafe.akka.persistence:2.4.17]
at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:168)[180:com.typesafe.akka.persistence:2.4.17]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.ActorCell.invoke(ActorCell.scala:495)[173:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)[173:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.run(Mailbox.scala:224)[173:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[173:com.typesafe.akka.actor:2.4.17]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
2017-05-25 21:31:28,763 | ERROR | lt-dispatcher-22 | ClientActorBehavior | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.SNAPSHOT | member-1-frontend-datastore-config: failed to resolve shard 0
org.opendaylight.controller.cluster.access.commands.NotLeaderException: Actor Actorakka://opendaylight-cluster-data/user/shardmanager-config/member-1-shard-default-config#-199254016 is not the current leader
at org.opendaylight.controller.cluster.datastore.Shard.handleConnectClient(Shard.java:435)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:305)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:270)[193:org.opendaylight.controller.sal-akka-raft:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:31)[192:org.opendaylight.controller.sal-clustering-commons:1.5.0.SNAPSHOT]
at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:170)[180:com.typesafe.akka.persistence:2.4.17]
at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:104)[192:org.opendaylight.controller.sal-clustering-commons:1.5.0.SNAPSHOT]
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.Actor$class.aroundReceive(Actor.scala:497)[173:com.typesafe.akka.actor:2.4.17]
at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:168)[180:com.typesafe.akka.persistence:2.4.17]
at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:664)[180:com.typesafe.akka.persistence:2.4.17]
at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:183)[180:com.typesafe.akka.persistence:2.4.17]
at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:168)[180:com.typesafe.akka.persistence:2.4.17]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.ActorCell.invoke(ActorCell.scala:495)[173:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)[173:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.run(Mailbox.scala:224)[173:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[173:com.typesafe.akka.actor:2.4.17]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
2017-05-25 21:31:28,764 | WARN | lt-dispatcher-22 | ConcurrentDOMDataBroker | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Tx: DOM-CHAIN-1-1002 Error during phase CAN_COMMIT, starting Abort
org.opendaylight.controller.cluster.access.commands.NotLeaderException: Actor Actorakka://opendaylight-cluster-data/user/shardmanager-config/member-1-shard-default-config#-199254016 is not the current leader
at org.opendaylight.controller.cluster.datastore.Shard.handleConnectClient(Shard.java:435)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:305)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:270)[193:org.opendaylight.controller.sal-akka-raft:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:31)[192:org.opendaylight.controller.sal-clustering-commons:1.5.0.SNAPSHOT]
at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:170)[180:com.typesafe.akka.persistence:2.4.17]
at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:104)[192:org.opendaylight.controller.sal-clustering-commons:1.5.0.SNAPSHOT]
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.Actor$class.aroundReceive(Actor.scala:497)[173:com.typesafe.akka.actor:2.4.17]
at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:168)[180:com.typesafe.akka.persistence:2.4.17]
at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:664)[180:com.typesafe.akka.persistence:2.4.17]
at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:183)[180:com.typesafe.akka.persistence:2.4.17]
at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:168)[180:com.typesafe.akka.persistence:2.4.17]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.ActorCell.invoke(ActorCell.scala:495)[173:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)[173:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.run(Mailbox.scala:224)[173:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[173:com.typesafe.akka.actor:2.4.17]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
Caused by: org.opendaylight.controller.cluster.access.commands.NotLeaderException: Actor Actorakka://opendaylight-cluster-data/user/shardmanager-config/member-1-shard-default-config#-199254016 is not the current leader
at org.opendaylight.controller.cluster.datastore.Shard.handleRequest(Shard.java:454)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:292)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
... 19 more

[0] https://git.opendaylight.org/gerrit/#/c/57822/1
[1] https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/15/archives/log.html.gz#s1-s20-t1-k2-k10-k1
[2] https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/15/archives/odl1_karaf.log.gz

Comment by Robert Varga [ 29/May/17 ]

This happens when the leader moves in the window between when we discover it and when our connect request arrives at it.

Comment by Robert Varga [ 29/May/17 ]

carbon: https://git.opendaylight.org/gerrit/57951

Comment by Peter Gubka [ 02/Jun/17 ]

Looks ok now with controller=96/57996/5:99/57699/5
https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/14/log.html.gz#s1-s22

Generated at Wed Feb 07 19:56:12 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.