Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1468

[Clustering] Datastore operations failure when leader is down

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Resolution: Cannot Reproduce
    • Beryllium
    • None
    • clustering
    • None
    • Operating System: All
      Platform: All

    • 4923

    Description

      Test cases:
      ===========
      Topology 3 nodes : Leader restart with 1 switches and 1K flows per DPN (add and modify and delete when one node is down)

      Steps:-
      =======
      1. In a 3 node cluster (c1, c2, c3), c2 is Leader and rest are followers
      2. Added 1000 flows via c1 and verified it is replicated all across the nodes.
      3. Made the Leader c2 down, c3 became leader.
      4. Deleted 300 flows from follower c1
      5. Observing 1000 flows in rest of the nodes c1 and c3 when doing GET opertaion on them, Expectation was to have 700 flows in the nodes.

      Please find attached logs of c1 and c3.

      Following similar Log is seen while attempting to delete 300 flows from follower c1 or c3
      =========================================================
      2015-12-14 02:32:44,594 | INFO | qtp440143820-841 | FlowTesterConcurrentImpl | 158 - org.opendaylight.openflowplugin.applications.bulk-o-matic -
      0.2.0.SNAPSHOT | Using Concurrent Flow Tester Impl
      2015-12-14 02:32:47,525 | INFO | Pool-13-worker-1 | FlowTesterConcurrentImpl | 158 - org.opendaylight.openflowplugin.applications.bulk-o-matic -
      0.2.0.SNAPSHOT | Total Flows read: 1001
      2015-12-14 02:33:01,577 | INFO | qtp440143820-849 | FlowTesterImpl | 158 - org.opendaylight.openflowplugin.applications.bulk-o-matic -
      0.2.0.SNAPSHOT | Using Sequential Flow Tester Impl
      2015-12-14 02:33:06,602 | WARN | lt-dispatcher-44 | ConcurrentDOMDataBroker | 138 - org.opendaylight.controller.sal-distributed-datastore -
      1.3.0.SNAPSHOT | Tx: DOM-9707 Error during phase CAN_COMMIT, starting Abort
      akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@10.183.181.43:2550/), Path
      (/user/shardmanager-config/member-3-shard-inventory-config)]] after [5000 ms]
      at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)[123:com.typesafe.akka.actor:2.3.14]
      at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)[123:com.typesafe.akka.actor:2.3.14]
      at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
      [120:org.scala-lang.scala-library:2.10.4.v20140209-180020-VFINAL-b66a39653b]
      at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)[120:org.scala-lang.scala-library:2.10.4.v20140209-180020-VFINAL-
      b66a39653b]
      at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)[123:com.typesafe.akka.actor:2.3.14]
      at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)[123:com.typesafe.akka.actor:2.3.14]
      at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)[123:com.typesafe.akka.actor:2.3.14]
      at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)[123:com.typesafe.akka.actor:2.3.14]
      at java.lang.Thread.run(Thread.java:745)[:1.7.0_65]
      2015-12-14 02:33:06,609 | ERROR | CommitFutures-10 | FlowTesterImpl | 158 - org.opendaylight.openflowplugin.applications.bulk-o-matic -
      0.2.0.SNAPSHOT | Error: TransactionCommitFailedException

      {message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@10.183.181.43:2550/), Path (/user/shardmanager-config/member-3-shard-inventory-config)]] after [5000 ms]]]}

      in Datastore write operation: dpid: 1, tableId: 1, sourceIp: 2
      2015-12-14 02:33:07,975 | INFO | qtp440143820-849 | FlowTesterConcurrentImpl | 158 - org.opendaylight.openflowplugin.applications.bulk-o-matic -
      0.2.0.SNAPSHOT | Usin
      g Concurrent Flow Tester Impl
      2015-12-14 02:33:10,772 | INFO | Pool-15-worker-1 | FlowTesterConcurrentImpl | 158 - org.opendaylight.openflowplugin.applications.bulk-o-matic -
      0.2.0.SNAPSHOT | Total Flows read: 1000

      Attaching the logs for the remaining nodes when erstwhile leader (controller c2) was brought down and logs are for c3 (new leader) and c1 (follower) nodes

      Attachments

        1. c3.karaf.rar
          8 kB
        2. c1.karaf.rar
          14 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Unassigned Unassigned
            muthukumaran.k@ericsson.com Muthukumaran Kothandaraman
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: