Details
-
Bug
-
Status: Resolved
-
Resolution: Duplicate
-
None
-
None
-
None
-
Operating System: All
Platform: All
-
8041
Description
Running scale tests on latest Boron code with a 3-node cluster.
ODL is part of an openstack, with 200 OVSs having a full-mesh of tunnels between them. 1 VM per OVS.
Everything seemed to work until around 196 VMS, but then creation of more VMs doesn't work anymore, and MD-SAL is throwing a lot of errors.
In the logs you can see I started the creation in 21/3/2017.
When I tried to create more VMs on the next day (22/3/2017) at around 11:33:24, I'm starting to get many errors in ODL1.
ODL 1 interesting errors:
2017-03-22 11:41:22,908 | ERROR | lt-dispatcher-45 | LocalThreePhaseCommitCohort | 175 - org.opendaylight.controller.sal-distributed-datastore - 1.4.3.SNAPSHOT | Failed to prepare transaction member-1-datastore-config-fe-0-txn-28762919 on backend
akka.pattern.AskTimeoutException: Ask timed out on ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-config/member-1-shard-inventory-config#-515521380)] after [30000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.datastore.messages.ReadyLocalTransaction".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)[157:com.typesafe.akka.actor:2.4.7]
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)[157:com.typesafe.akka.actor:2.4.7]
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)[153:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[153:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)[153:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)[157:com.typesafe.akka.actor:2.4.7]
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)[157:com.typesafe.akka.actor:2.4.7]
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)[157:com.typesafe.akka.actor:2.4.7]
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:238)[157:com.typesafe.akka.actor:2.4.7]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_121]
2017-03-22 11:41:22,908 | ERROR | lt-dispatcher-59 | LocalThreePhaseCommitCohort | 175 - org.opendaylight.controller.sal-distributed-datastore - 1.4.3.SNAPSHOT | Failed to prepare transaction member-1-datastore-config-fe-0-txn-28762921 on backend
akka.pattern.AskTimeoutException: Ask timed out on ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-config/member-1-shard-inventory-config#-515521380)] after [30000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.datastore.messages.ReadyLocalTransaction".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)[157:com.typesafe.akka.actor:2.4.7]
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)[157:com.typesafe.akka.actor:2.4.7]
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)[153:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[153:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)[153:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)[157:com.typesafe.akka.actor:2.4.7]
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)[157:com.typesafe.akka.actor:2.4.7]
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)[157:com.typesafe.akka.actor:2.4.7]
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:238)[157:com.typesafe.akka.actor:2.4.7]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_121]
2017-03-22 11:41:22,908 | WARN | t-dispatcher-134 | ConcurrentDOMDataBroker | 175 - org.opendaylight.controller.sal-distributed-datastore - 1.4.3.SNAPSHOT | Tx: DOM-54456305 Error during phase CAN_COMMIT, starting Abort
akka.pattern.AskTimeoutException: Ask timed out on ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-config/member-1-shard-inventory-config#-515521380)] after [30000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.datastore.messages.ReadyLocalTransaction".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)[157:com.typesafe.akka.actor:2.4.7]
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)[157:com.typesafe.akka.actor:2.4.7]
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)[153:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[153:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)[153:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)[157:com.typesafe.akka.actor:2.4.7]
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)[157:com.typesafe.akka.actor:2.4.7]
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)[157:com.typesafe.akka.actor:2.4.7]
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:238)[157:com.typesafe.akka.actor:2.4.7]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_121]
ODL 3 interesting errors:
2017-03-22 11:33:27,068 | ERROR | Pool-1-worker-37 | DataStoreJobCoordinator | 283 - org.opendaylight.genius.mdsalutil-api - 0.1.3.SNAPSHOT | Exception when executing jobEntry: JobEntry
java.lang.RuntimeException: java.util.concurrent.ExecutionException: ReadFailedException{message=Error executeRead ReadData for path /(urn:opendaylight:netvirt:elan?revision=2015-06-02)elan-interfaces/elan-inter
face/elan-interface[
], errorList=[RpcError [message=Error executeRead ReadData for path /(urn:opendaylight:netvirt:elan?revision=2015-06-02
)elan-interfaces/elan-interface/elan-interface[
], severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=nu
ll, cause=java.lang.Exception: Error creating READ_ONLY transaction on shard default]]}
at org.opendaylight.genius.mdsalutil.MDSALUtil.read(MDSALUtil.java:512)
at org.opendaylight.netvirt.elan.utils.ElanUtils.getElanInterfaceByElanInterfaceName(ElanUtils.java:364)
at org.opendaylight.netvirt.elan.utils.ElanUtils.getEtreeInterfaceByElanInterfaceName(ElanUtils.java:373)
at org.opendaylight.netvirt.elan.utils.ElanUtils.setupEtreeRemoteDmacFlow(ElanUtils.java:1103)
at org.opendaylight.netvirt.elan.utils.ElanUtils.setupRemoteDmacFlow(ElanUtils.java:1095)
at org.opendaylight.netvirt.elan.utils.ElanUtils.setupOrigDmacFlows(ElanUtils.java:954)
at org.opendaylight.netvirt.elan.utils.ElanUtils.setupMacFlows(ElanUtils.java:738)
at org.opendaylight.netvirt.elan.internal.ElanPacketInHandler.lambda$enqueueJobForDPNSpecificTasks$1(ElanPacketInHandler.java:198)
at org.opendaylight.genius.datastoreutils.DataStoreJobCoordinator$MainTask.run(DataStoreJobCoordinator.java:254)
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Caused by: java.util.concurrent.ExecutionException: ReadFailedException{message=Error executeRead ReadData for path /(urn:opendaylight:netvirt:elan?revision=2015-06-02)elan-interfaces/elan-interface/elan-interfa
ce[
], errorList=[RpcError [message=Error executeRead ReadData for path /(urn:opendaylight:netvirt:elan?revision=2015-06-02)elan-interfaces/
elan-interface/elan-interface[
], severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=java.la
ng.Exception: Error creating READ_ONLY transaction on shard default]]}
at org.opendaylight.yangtools.util.concurrent.MappingCheckedFuture.wrapInExecutionException(MappingCheckedFuture.java:63)
at org.opendaylight.yangtools.util.concurrent.MappingCheckedFuture.get(MappingCheckedFuture.java:76)
at org.opendaylight.genius.mdsalutil.MDSALUtil.read(MDSALUtil.java:510)
... 13 more
Caused by: ReadFailedException{message=Error executeRead ReadData for path /(urn:opendaylight:netvirt:elan?revision=2015-06-02)elan-interfaces/elan-interface/elan-interface[
], errorList=[RpcError [message=Error executeRead ReadData for path /(urn:opendaylight:netvirt:elan?revision=2015-06-02)elan-interfaces/elan-interface/elan-interface[
{(urn:opend aylight:netvirt:elan?revision=2015-06-02)name=tun816c0459241}], severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=java.lang.Exception: Error creating READ_ONLY tr
ansaction on shard default]]}
at org.opendaylight.controller.cluster.datastore.NoOpTransactionContext.executeRead(NoOpTransactionContext.java:67)
at org.opendaylight.controller.cluster.datastore.TransactionProxy$1.invoke(TransactionProxy.java:93)
at org.opendaylight.controller.cluster.datastore.TransactionContextWrapper.executePriorTransactionOperations(TransactionContextWrapper.java:132)
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.createTransactionContext(RemoteTransactionContextSupport.java:239)
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.onCreateTransactionComplete(RemoteTransactionContextSupport.java:202)
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.access$000(RemoteTransactionContextSupport.java:39)
at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport$1.onComplete(RemoteTransactionContextSupport.java:137)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
Ran jolokia/read/org.opendaylight.controller:Category=Shards,name=member-X-shard-topology-operational,type=DistributedOperationalDatastore on all ODLs.
Results:
ODL1:
{"request":
,"value":
{"ReadWriteTransactionCount":0,"SnapshotIndex":2645417,"InMemoryJournalLogSize":6,"ReplicatedToAllIndex":2645417,"Leader":"member-3-shard-topology-operational","LastIndex":2645423,"RaftState":"Follower","LastApplied":2645423,"LastCommittedTransactionTime":"1970-01-01 02:00:00.000","LastLogIndex":2645423,"LastLeadershipChangeTime":"2017-03-21 18:06:45.018","PeerAddresses":"member-2-shard-topology-operational: , member-3-shard-topology-operational: akka.tcp:\/\/opendaylight-cluster-data@172.16.1.153:2550\/user\/shardmanager-operational\/member-3-shard-topology-operational","WriteOnlyTransactionCount":0,"FollowerInitialSyncStatus":true,"FollowerInfo":[],"FailedReadTransactionsCount":0,"Voting":true,"StatRetrievalTime":"2.828 ms","CurrentTerm":1,"LastTerm":1,"FailedTransactionsCount":0,"PendingTxCommitQueueSize":0,"VotedFor":"member-3-shard-topology-operational","SnapshotCaptureInitiated":false,"CommittedTransactionsCount":0,"TxCohortCacheSize":0,"PeerVotingStates":"member-2-shard-topology-operational: true, member-3-shard-topology-operational: true","LastLogTerm":1,"StatRetrievalError":null,"CommitIndex":2645423,"SnapshotTerm":1,"AbortTransactionsCount":0,"ReadOnlyTransactionCount":0,"ShardName":"member-1-shard-topology-operational","LeadershipChangeCount":1,"InMemoryJournalDataSize":7452},"timestamp":1490175963,"status":200}
ODL2:
{"request":
,"value":
{"ReadWriteTransactionCount":0,"SnapshotIndex":2645983,"InMemoryJournalLogSize":3,"ReplicatedToAllIndex":2645983,"Leader":"member-3-shard-topology-operational","LastIndex":2645986,"RaftState":"Follower","LastCommittedTransactionTime":"1970-01-01 02:00:00.000","LastApplied":2645985,"LastLogIndex":2645986,"LastLeadershipChangeTime":"2017-03-21 18:06:45.011","PeerAddresses":"member-1-shard-topology-operational: , member-3-shard-topology-operational: akka.tcp:\/\/opendaylight-cluster-data@172.16.1.153:2550\/user\/shardmanager-operational\/member-3-shard-topology-operational","WriteOnlyTransactionCount":0,"FollowerInitialSyncStatus":true,"FollowerInfo":[],"FailedReadTransactionsCount":0,"StatRetrievalTime":"11.85 ms","Voting":true,"CurrentTerm":1,"LastTerm":1,"FailedTransactionsCount":0,"PendingTxCommitQueueSize":0,"VotedFor":"member-3-shard-topology-operational","SnapshotCaptureInitiated":false,"CommittedTransactionsCount":0,"TxCohortCacheSize":0,"PeerVotingStates":"member-1-shard-topology-operational: true, member-3-shard-topology-operational: true","LastLogTerm":1,"StatRetrievalError":null,"CommitIndex":2645985,"SnapshotTerm":1,"AbortTransactionsCount":0,"ReadOnlyTransactionCount":0,"ShardName":"member-2-shard-topology-operational","LeadershipChangeCount":1,"InMemoryJournalDataSize":3726},"timestamp":1490175977,"status":200}
ODL3:
{"request":
,"value":{"ReadWriteTransactionCount":881228,"SnapshotIndex":2646308,"InMemoryJournalLogSize":5,"ReplicatedToAllIndex":2646308,"Leader":"member-3-shard-topology-operational","LastIndex":2646313,"RaftState":"Leader","LastCommittedTransactionTime":"2017-03-22 11:46:26.848","LastApplied":2646313,"LastLeadershipChangeTime":"2017-03-21 18:06:44.977","LastLogIndex":2646313,"PeerAddresses":"member-1-shard-topology-operational: akka.tcp:\/\/opendaylight-cluster-data@172.16.1.142:2550\/user\/shardmanager-operational\/member-1-shard-topology-operational, member-2-shard-topology-operational: akka.tcp:\/\/opendaylight-cluster-data@172.16.1.136:2550\/user\/shardmanager-operational\/member-2-shard-topology-operational","FollowerInitialSyncStatus":false,"WriteOnlyTransactionCount":0,"FollowerInfo":[{"timeSinceLastActivity":"00:00:00.070","active":true,"matchIndex":2646308,"voting":
Attachments
Issue Links
- duplicates
-
CONTROLLER-1483 akka.pattern.AskTimeoutException on follower while BGP peer introduces 1M prefixes
- Resolved