Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1618

Scale testing: akka.pattern.AskTimeoutException ... after [30000 ms]

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Resolution: Duplicate
    • None
    • None
    • clustering
    • None
    • Operating System: All
      Platform: All

    • 8041

    Description

      Running scale tests on latest Boron code with a 3-node cluster.
      ODL is part of an openstack, with 200 OVSs having a full-mesh of tunnels between them. 1 VM per OVS.
      Everything seemed to work until around 196 VMS, but then creation of more VMs doesn't work anymore, and MD-SAL is throwing a lot of errors.
      In the logs you can see I started the creation in 21/3/2017.
      When I tried to create more VMs on the next day (22/3/2017) at around 11:33:24, I'm starting to get many errors in ODL1.
      ODL 1 interesting errors:
      2017-03-22 11:41:22,908 | ERROR | lt-dispatcher-45 | LocalThreePhaseCommitCohort | 175 - org.opendaylight.controller.sal-distributed-datastore - 1.4.3.SNAPSHOT | Failed to prepare transaction member-1-datastore-config-fe-0-txn-28762919 on backend
      akka.pattern.AskTimeoutException: Ask timed out on ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-config/member-1-shard-inventory-config#-515521380)] after [30000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.datastore.messages.ReadyLocalTransaction".
      at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)[157:com.typesafe.akka.actor:2.4.7]
      at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)[157:com.typesafe.akka.actor:2.4.7]
      at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)[153:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
      at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[153:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
      at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)[153:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
      at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)[157:com.typesafe.akka.actor:2.4.7]
      at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)[157:com.typesafe.akka.actor:2.4.7]
      at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)[157:com.typesafe.akka.actor:2.4.7]
      at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:238)[157:com.typesafe.akka.actor:2.4.7]
      at java.lang.Thread.run(Thread.java:745)[:1.8.0_121]
      2017-03-22 11:41:22,908 | ERROR | lt-dispatcher-59 | LocalThreePhaseCommitCohort | 175 - org.opendaylight.controller.sal-distributed-datastore - 1.4.3.SNAPSHOT | Failed to prepare transaction member-1-datastore-config-fe-0-txn-28762921 on backend
      akka.pattern.AskTimeoutException: Ask timed out on ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-config/member-1-shard-inventory-config#-515521380)] after [30000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.datastore.messages.ReadyLocalTransaction".
      at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)[157:com.typesafe.akka.actor:2.4.7]
      at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)[157:com.typesafe.akka.actor:2.4.7]
      at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)[153:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
      at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[153:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
      at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)[153:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
      at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)[157:com.typesafe.akka.actor:2.4.7]
      at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)[157:com.typesafe.akka.actor:2.4.7]
      at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)[157:com.typesafe.akka.actor:2.4.7]
      at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:238)[157:com.typesafe.akka.actor:2.4.7]
      at java.lang.Thread.run(Thread.java:745)[:1.8.0_121]
      2017-03-22 11:41:22,908 | WARN | t-dispatcher-134 | ConcurrentDOMDataBroker | 175 - org.opendaylight.controller.sal-distributed-datastore - 1.4.3.SNAPSHOT | Tx: DOM-54456305 Error during phase CAN_COMMIT, starting Abort
      akka.pattern.AskTimeoutException: Ask timed out on ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-config/member-1-shard-inventory-config#-515521380)] after [30000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.datastore.messages.ReadyLocalTransaction".
      at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)[157:com.typesafe.akka.actor:2.4.7]
      at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)[157:com.typesafe.akka.actor:2.4.7]
      at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)[153:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
      at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[153:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
      at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)[153:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
      at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)[157:com.typesafe.akka.actor:2.4.7]
      at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)[157:com.typesafe.akka.actor:2.4.7]
      at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)[157:com.typesafe.akka.actor:2.4.7]
      at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:238)[157:com.typesafe.akka.actor:2.4.7]
      at java.lang.Thread.run(Thread.java:745)[:1.8.0_121]

      ODL 3 interesting errors:
      2017-03-22 11:33:27,068 | ERROR | Pool-1-worker-37 | DataStoreJobCoordinator | 283 - org.opendaylight.genius.mdsalutil-api - 0.1.3.SNAPSHOT | Exception when executing jobEntry: JobEntry

      {key='MAC-00:1C:7 3:4E:D3:31 ELAN_TAG-5501DPN_ID-66396557993593', mainWorker=org.opendaylight.netvirt.elan.internal.ElanPacketInHandler$$Lambda$183/1454185117@69186057, rollbackWorker=null, retryCount=0, futures=null}

      java.lang.RuntimeException: java.util.concurrent.ExecutionException: ReadFailedException{message=Error executeRead ReadData for path /(urn:opendaylight:netvirt:elan?revision=2015-06-02)elan-interfaces/elan-inter
      face/elan-interface[

      {(urn:opendaylight:netvirt:elan?revision=2015-06-02)name=tun816c0459241}

      ], errorList=[RpcError [message=Error executeRead ReadData for path /(urn:opendaylight:netvirt:elan?revision=2015-06-02
      )elan-interfaces/elan-interface/elan-interface[

      {(urn:opendaylight:netvirt:elan?revision=2015-06-02)name=tun816c0459241}

      ], severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=nu
      ll, cause=java.lang.Exception: Error creating READ_ONLY transaction on shard default]]}
      at org.opendaylight.genius.mdsalutil.MDSALUtil.read(MDSALUtil.java:512)
      at org.opendaylight.netvirt.elan.utils.ElanUtils.getElanInterfaceByElanInterfaceName(ElanUtils.java:364)
      at org.opendaylight.netvirt.elan.utils.ElanUtils.getEtreeInterfaceByElanInterfaceName(ElanUtils.java:373)
      at org.opendaylight.netvirt.elan.utils.ElanUtils.setupEtreeRemoteDmacFlow(ElanUtils.java:1103)
      at org.opendaylight.netvirt.elan.utils.ElanUtils.setupRemoteDmacFlow(ElanUtils.java:1095)
      at org.opendaylight.netvirt.elan.utils.ElanUtils.setupOrigDmacFlows(ElanUtils.java:954)
      at org.opendaylight.netvirt.elan.utils.ElanUtils.setupMacFlows(ElanUtils.java:738)
      at org.opendaylight.netvirt.elan.internal.ElanPacketInHandler.lambda$enqueueJobForDPNSpecificTasks$1(ElanPacketInHandler.java:198)
      at org.opendaylight.genius.datastoreutils.DataStoreJobCoordinator$MainTask.run(DataStoreJobCoordinator.java:254)
      at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
      at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
      at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
      at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
      at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
      Caused by: java.util.concurrent.ExecutionException: ReadFailedException{message=Error executeRead ReadData for path /(urn:opendaylight:netvirt:elan?revision=2015-06-02)elan-interfaces/elan-interface/elan-interfa
      ce[

      {(urn:opendaylight:netvirt:elan?revision=2015-06-02)name=tun816c0459241}

      ], errorList=[RpcError [message=Error executeRead ReadData for path /(urn:opendaylight:netvirt:elan?revision=2015-06-02)elan-interfaces/
      elan-interface/elan-interface[

      {(urn:opendaylight:netvirt:elan?revision=2015-06-02)name=tun816c0459241}

      ], severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=java.la
      ng.Exception: Error creating READ_ONLY transaction on shard default]]}
      at org.opendaylight.yangtools.util.concurrent.MappingCheckedFuture.wrapInExecutionException(MappingCheckedFuture.java:63)
      at org.opendaylight.yangtools.util.concurrent.MappingCheckedFuture.get(MappingCheckedFuture.java:76)
      at org.opendaylight.genius.mdsalutil.MDSALUtil.read(MDSALUtil.java:510)
      ... 13 more
      Caused by: ReadFailedException{message=Error executeRead ReadData for path /(urn:opendaylight:netvirt:elan?revision=2015-06-02)elan-interfaces/elan-interface/elan-interface[

      {(urn:opendaylight:netvirt:elan?revisi on=2015-06-02)name=tun816c0459241}

      ], errorList=[RpcError [message=Error executeRead ReadData for path /(urn:opendaylight:netvirt:elan?revision=2015-06-02)elan-interfaces/elan-interface/elan-interface[

      {(urn:opend aylight:netvirt:elan?revision=2015-06-02)name=tun816c0459241}

      ], severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=java.lang.Exception: Error creating READ_ONLY tr
      ansaction on shard default]]}
      at org.opendaylight.controller.cluster.datastore.NoOpTransactionContext.executeRead(NoOpTransactionContext.java:67)
      at org.opendaylight.controller.cluster.datastore.TransactionProxy$1.invoke(TransactionProxy.java:93)
      at org.opendaylight.controller.cluster.datastore.TransactionContextWrapper.executePriorTransactionOperations(TransactionContextWrapper.java:132)
      at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.createTransactionContext(RemoteTransactionContextSupport.java:239)
      at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.onCreateTransactionComplete(RemoteTransactionContextSupport.java:202)
      at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.access$000(RemoteTransactionContextSupport.java:39)
      at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport$1.onComplete(RemoteTransactionContextSupport.java:137)
      at akka.dispatch.OnComplete.internal(Future.scala:258)
      at akka.dispatch.OnComplete.internal(Future.scala:256)
      at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
      at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
      at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
      at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
      at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)

      Ran jolokia/read/org.opendaylight.controller:Category=Shards,name=member-X-shard-topology-operational,type=DistributedOperationalDatastore on all ODLs.
      Results:
      ODL1:
      {"request":

      {"mbean":"org.opendaylight.controller:Category=Shards,name=member-1-shard-topology-operational,type=DistributedOperationalDatastore","type":"read"}

      ,"value":

      {"ReadWriteTransactionCount":0,"SnapshotIndex":2645417,"InMemoryJournalLogSize":6,"ReplicatedToAllIndex":2645417,"Leader":"member-3-shard-topology-operational","LastIndex":2645423,"RaftState":"Follower","LastApplied":2645423,"LastCommittedTransactionTime":"1970-01-01 02:00:00.000","LastLogIndex":2645423,"LastLeadershipChangeTime":"2017-03-21 18:06:45.018","PeerAddresses":"member-2-shard-topology-operational: , member-3-shard-topology-operational: akka.tcp:\/\/opendaylight-cluster-data@172.16.1.153:2550\/user\/shardmanager-operational\/member-3-shard-topology-operational","WriteOnlyTransactionCount":0,"FollowerInitialSyncStatus":true,"FollowerInfo":[],"FailedReadTransactionsCount":0,"Voting":true,"StatRetrievalTime":"2.828 ms","CurrentTerm":1,"LastTerm":1,"FailedTransactionsCount":0,"PendingTxCommitQueueSize":0,"VotedFor":"member-3-shard-topology-operational","SnapshotCaptureInitiated":false,"CommittedTransactionsCount":0,"TxCohortCacheSize":0,"PeerVotingStates":"member-2-shard-topology-operational: true, member-3-shard-topology-operational: true","LastLogTerm":1,"StatRetrievalError":null,"CommitIndex":2645423,"SnapshotTerm":1,"AbortTransactionsCount":0,"ReadOnlyTransactionCount":0,"ShardName":"member-1-shard-topology-operational","LeadershipChangeCount":1,"InMemoryJournalDataSize":7452}

      ,"timestamp":1490175963,"status":200}
      ODL2:
      {"request":

      {"mbean":"org.opendaylight.controller:Category=Shards,name=member-2-shard-topology-operational,type=DistributedOperationalDatastore","type":"read"}

      ,"value":

      {"ReadWriteTransactionCount":0,"SnapshotIndex":2645983,"InMemoryJournalLogSize":3,"ReplicatedToAllIndex":2645983,"Leader":"member-3-shard-topology-operational","LastIndex":2645986,"RaftState":"Follower","LastCommittedTransactionTime":"1970-01-01 02:00:00.000","LastApplied":2645985,"LastLogIndex":2645986,"LastLeadershipChangeTime":"2017-03-21 18:06:45.011","PeerAddresses":"member-1-shard-topology-operational: , member-3-shard-topology-operational: akka.tcp:\/\/opendaylight-cluster-data@172.16.1.153:2550\/user\/shardmanager-operational\/member-3-shard-topology-operational","WriteOnlyTransactionCount":0,"FollowerInitialSyncStatus":true,"FollowerInfo":[],"FailedReadTransactionsCount":0,"StatRetrievalTime":"11.85 ms","Voting":true,"CurrentTerm":1,"LastTerm":1,"FailedTransactionsCount":0,"PendingTxCommitQueueSize":0,"VotedFor":"member-3-shard-topology-operational","SnapshotCaptureInitiated":false,"CommittedTransactionsCount":0,"TxCohortCacheSize":0,"PeerVotingStates":"member-1-shard-topology-operational: true, member-3-shard-topology-operational: true","LastLogTerm":1,"StatRetrievalError":null,"CommitIndex":2645985,"SnapshotTerm":1,"AbortTransactionsCount":0,"ReadOnlyTransactionCount":0,"ShardName":"member-2-shard-topology-operational","LeadershipChangeCount":1,"InMemoryJournalDataSize":3726}

      ,"timestamp":1490175977,"status":200}
      ODL3:
      {"request":

      {"mbean":"org.opendaylight.controller:Category=Shards,name=member-3-shard-topology-operational,type=DistributedOperationalDatastore","type":"read"}

      ,"value":{"ReadWriteTransactionCount":881228,"SnapshotIndex":2646308,"InMemoryJournalLogSize":5,"ReplicatedToAllIndex":2646308,"Leader":"member-3-shard-topology-operational","LastIndex":2646313,"RaftState":"Leader","LastCommittedTransactionTime":"2017-03-22 11:46:26.848","LastApplied":2646313,"LastLeadershipChangeTime":"2017-03-21 18:06:44.977","LastLogIndex":2646313,"PeerAddresses":"member-1-shard-topology-operational: akka.tcp:\/\/opendaylight-cluster-data@172.16.1.142:2550\/user\/shardmanager-operational\/member-1-shard-topology-operational, member-2-shard-topology-operational: akka.tcp:\/\/opendaylight-cluster-data@172.16.1.136:2550\/user\/shardmanager-operational\/member-2-shard-topology-operational","FollowerInitialSyncStatus":false,"WriteOnlyTransactionCount":0,"FollowerInfo":[{"timeSinceLastActivity":"00:00:00.070","active":true,"matchIndex":2646308,"voting":

      Attachments

        1. odl1-part6.tar.gz
          30 kB
        2. odl1-part5.tar.gz
          302 kB
        3. odl1-part4.tar.gz
          286 kB
        4. odl1-part3.tar.gz
          918 kB
        5. odl1-part2.tar.gz
          841 kB
        6. odl1-part1.tar.gz
          751 kB
        7. odl3-part2.tar.gz
          412 kB
        8. odl3-part1.tar.gz
          904 kB
        9. odl2.tar.gz
          504 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Unassigned Unassigned
              guy.sela@hpe.com Guy Sela
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: