Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1690

Frontend client tries to reconnect to the same member which is no longer leader

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Resolution: Duplicate
    • None
    • None
    • clustering
    • None
    • Operating System: All
      Platform: All

    • 8513

    Description

      I had most of this written when I realized member-1 received UnreachableMember just around the time shard replica removal was happening, so at the end this might be just another symptom of general CONTROLLER-1645 behavior.
      Opening this anyway, just to have this particular symptom documented.

      Scenario: Prefix-based shard (thus tell-based protocol) is created, transaction producer active on the leader member. Then the replica on that member is removed, transaction producer is expected to continue on when one of the other members becomes the new leader.

      Robot sees the producer fail [0].
      The message corresponds to what is in karaf.log [1]:
      2017-05-19 05:13:04,133 | ERROR | ult-dispatcher-2 | ClientActorBehavior | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.Carbon | member-3-frontend-datastore-Shard-id-ints!: failed to resolve shard 0
      org.opendaylight.controller.cluster.access.commands.NotLeaderException: Actor Actorakka://opendaylight-cluster-data/user/shardmanager-config/member-3-shard-id-ints!-config#300839828 is not the current leader
      at org.opendaylight.controller.cluster.datastore.Shard.handleConnectClient(Shard.java:435)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.Carbon]
      at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:305)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.Carbon]
      at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:270)[193:org.opendaylight.controller.sal-akka-raft:1.5.0.Carbon]
      at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:31)[192:org.opendaylight.controller.sal-clustering-commons:1.5.0.Carbon]
      at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:170)[180:com.typesafe.akka.persistence:2.4.17]
      at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:104)[192:org.opendaylight.controller.sal-clustering-commons:1.5.0.Carbon]
      at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)[173:com.typesafe.akka.actor:2.4.17]
      at akka.actor.Actor$class.aroundReceive(Actor.scala:497)[173:com.typesafe.akka.actor:2.4.17]
      at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:168)[180:com.typesafe.akka.persistence:2.4.17]
      at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:664)[180:com.typesafe.akka.persistence:2.4.17]
      at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:183)[180:com.typesafe.akka.persistence:2.4.17]
      at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:168)[180:com.typesafe.akka.persistence:2.4.17]
      at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)[173:com.typesafe.akka.actor:2.4.17]
      at akka.actor.ActorCell.invoke(ActorCell.scala:495)[173:com.typesafe.akka.actor:2.4.17]
      at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)[173:com.typesafe.akka.actor:2.4.17]
      at akka.dispatch.Mailbox.run(Mailbox.scala:224)[173:com.typesafe.akka.actor:2.4.17]
      at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[173:com.typesafe.akka.actor:2.4.17]
      at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
      at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
      at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
      at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]

      In this case member-3 was the original leader and the (expected) message at 05:13:04,063 shows the client detected member-3 is no longer the leader.
      The client should attempt to contact the new leader, but as I mentioned at the start, the new leader might be unreachable at the time.

      This fix [2] should make this situation less likely anyway.

      [0] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/697/archives/log.html.gz#s1-s22-t1-k2-k10
      [1] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/697/archives/odl3_karaf.log.gz
      [2] https://git.opendaylight.org/gerrit/57074

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Unassigned Unassigned
              vrpolak Vratko Polak
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: