Details
-
Bug
-
Status: Resolved
-
Resolution: Done
-
Post-Helium
-
None
-
None
-
Operating System: All
Platform: All
-
3329
Description
I set up a simple 3 node cluster and ran the flow_config_blaster.py script in the integration repo against it. This script allows you to create flows using multiple threads. I found that whenever I use multiple threads (e.g. 10 threads), a substantial majority of the add-flow requests would fail. It seems that only the first request succeeds, while other concurrent requests get stuck. This is with the latest integration build from master.
The leader node produces log entries like these:
2015-05-20 13:07:18,443 | WARN | lt-dispatcher-17 | ConcurrentDOMDataBroker | 212 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Tx: DOM-336 Error during phase CAN_COMMIT, starting Abort
akka.pattern.AskTimeoutException: Ask timed out on ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-config/member-1-shard-inventory-config#-1821580364)] after [5000 ms]
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)[197:com.typesafe.akka.actor:2.3.10]
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)[197:com.typesafe.akka.actor:2.3.10]
at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)[194:org.scala-lang.scala-library:2.10.4.v20140209-180020-VFINAL-b66a39653b]
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)[194:org.scala-lang.scala-library:2.10.4.v20140209-180020-VFINAL-b66a39653b]
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)[197:com.typesafe.akka.actor:2.3.10]
at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)[197:com.typesafe.akka.actor:2.3.10]
at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)[197:com.typesafe.akka.actor:2.3.10]
at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)[197:com.typesafe.akka.actor:2.3.10]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_75]
2015-05-20 13:07:18,446 | ERROR | lt-dispatcher-17 | LocalThreePhaseCommitCohort | 212 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Failed to prepare transaction member-1-txn-344 on backend
akka.pattern.AskTimeoutException: Ask timed out on ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-config/member-1-shard-inventory-config#-1821580364)] after [5000 ms]
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)[197:com.typesafe.akka.actor:2.3.10]
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)[197:com.typesafe.akka.actor:2.3.10]
at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)[194:org.scala-lang.scala-library:2.10.4.v20140209-180020-VFINAL-b66a39653b]
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)[194:org.scala-lang.scala-library:2.10.4.v20140209-180020-VFINAL-b66a39653b]
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)[197:com.typesafe.akka.actor:2.3.10]
at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)[197:com.typesafe.akka.actor:2.3.10]
at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)[197:com.typesafe.akka.actor:2.3.10]
at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)[197:com.typesafe.akka.actor:2.3.10]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_75]
While the follower nodes produce log entries like this:
2015-05-20 13:07:14,093 | WARN | lt-dispatcher-17 | Shard | 205 - org.opendaylight.controller.sal-akka-raft - 1.2.0.SNAPSHOT | ApplyState took more time than expected. Elapsed Time = 72 ms ApplyState = ApplyState
{identifier='null', replicatedLogEntry.index =1771, startTime=1828178749622}Sequential add-flow (e.g. one thread only) works fine. Multithreaded add-flow works fine when the cluster has only a single node.