Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1330

Clustering errors under concurrent load

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Resolution: Done
    • Post-Helium
    • None
    • clustering
    • None
    • Operating System: All
      Platform: All

    • 3329

    Description

      I set up a simple 3 node cluster and ran the flow_config_blaster.py script in the integration repo against it. This script allows you to create flows using multiple threads. I found that whenever I use multiple threads (e.g. 10 threads), a substantial majority of the add-flow requests would fail. It seems that only the first request succeeds, while other concurrent requests get stuck. This is with the latest integration build from master.

      The leader node produces log entries like these:

      2015-05-20 13:07:18,443 | WARN | lt-dispatcher-17 | ConcurrentDOMDataBroker | 212 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx: DOM-336 Error during phase CAN_COMMIT, starting Abort
      akka.pattern.AskTimeoutException: Ask timed out on ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-config/member-1-shard-inventory-config#-1821580364)] after [5000 ms]
      at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)[197:com.typesafe.akka.actor:2.3.10]
      at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)[197:com.typesafe.akka.actor:2.3.10]
      at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)[194:org.scala-lang.scala-library:2.10.4.v20140209-180020-VFINAL-b66a39653b]
      at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)[194:org.scala-lang.scala-library:2.10.4.v20140209-180020-VFINAL-b66a39653b]
      at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)[197:com.typesafe.akka.actor:2.3.10]
      at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)[197:com.typesafe.akka.actor:2.3.10]
      at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)[197:com.typesafe.akka.actor:2.3.10]
      at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)[197:com.typesafe.akka.actor:2.3.10]
      at java.lang.Thread.run(Thread.java:745)[:1.7.0_75]
      2015-05-20 13:07:18,446 | ERROR | lt-dispatcher-17 | LocalThreePhaseCommitCohort | 212 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Failed to prepare transaction member-1-txn-344 on backend
      akka.pattern.AskTimeoutException: Ask timed out on ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-config/member-1-shard-inventory-config#-1821580364)] after [5000 ms]
      at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)[197:com.typesafe.akka.actor:2.3.10]
      at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)[197:com.typesafe.akka.actor:2.3.10]
      at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)[194:org.scala-lang.scala-library:2.10.4.v20140209-180020-VFINAL-b66a39653b]
      at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)[194:org.scala-lang.scala-library:2.10.4.v20140209-180020-VFINAL-b66a39653b]
      at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)[197:com.typesafe.akka.actor:2.3.10]
      at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)[197:com.typesafe.akka.actor:2.3.10]
      at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)[197:com.typesafe.akka.actor:2.3.10]
      at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)[197:com.typesafe.akka.actor:2.3.10]
      at java.lang.Thread.run(Thread.java:745)[:1.7.0_75]

      While the follower nodes produce log entries like this:

      2015-05-20 13:07:14,093 | WARN | lt-dispatcher-17 | Shard | 205 - org.opendaylight.controller.sal-akka-raft - 1.2.0.SNAPSHOT | ApplyState took more time than expected. Elapsed Time = 72 ms ApplyState = ApplyState

      {identifier='null', replicatedLogEntry.index =1771, startTime=1828178749622}

      Sequential add-flow (e.g. one thread only) works fine. Multithreaded add-flow works fine when the cluster has only a single node.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Unassigned Unassigned
            gary.wu1@huawei.com Gary Wu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: