[NETVIRT-985] java.lang.OutOfMemoryError: Java heap space Created: 10/Nov/17  Updated: 08/May/18  Resolved: 30/Nov/17

Status: Resolved
Project: netvirt
Component/s: None
Affects Version/s: Carbon
Fix Version/s: Nitrogen, Carbon, Oxygen

Type: Bug Priority: Medium
Reporter: Sai Sindhur Malleni Assignee: Michael Vorburger
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File Controller1_open-transactions.txt     Text File Controller2_open-transactions.txt     Text File Controller3_open-transactions.txt     Zip Archive controller-2__java_pid1_Leak_Suspects.zip     Zip Archive java_pid1_Leak_Suspects.zip     File karaf.log.xz     Text File open-transactions_controller-0.txt     Text File open-transactions_controller-1.txt     Text File open-transactions_controller-2.txt    
Issue Links:
Blocks
blocks NETVIRT-878 CSIT should help to detect possible m... In Progress
blocks GENIUS-102 New OOM due to more TX leaks seen in ... Resolved
blocks OPNFLWPLUG-961 New OOM due to more TX leaks seen in ... Resolved
blocks OVSDB-435 New OOM due to more TX leaks seen in ... Resolved
blocks NETVIRT-883 Umbrella parent issue for grouping al... Resolved
blocks NETVIRT-1010 OOM and other memory issues Resolved
is blocked by NETVIRT-886 Suspected ReadOnlyTransaction leak in... Resolved
is blocked by OVSDB-425 Suspected ReadOnlyTransaction leak in... Resolved
is blocked by OVSDB-426 Suspected ReadOnlyTransaction leak in... Resolved
is blocked by NETVIRT-1000 Transaction Leaks in neutronvpn, fibm... Resolved
Duplicate
is duplicated by NETVIRT-986 java.lang.OutOfMemoryError: GC overhe... Resolved
Relates
relates to CONTROLLER-1756 OOM due to huge Map in ShardDataTree Resolved
relates to CONTROLLER-1787 java.lang.Exception: Error creating R... Resolved
relates to CONTROLLER-1788 Failed to execute RPC (urn:opendaylig... Resolved
relates to OPNFLWPLUG-955 Service group openflow:18128186049282... Resolved
relates to OPNFLWPLUG-956 No queue present, failing request Resolved
relates to OPNFLWPLUG-957 TransactionChainManager... Exception ... Resolved

 Description   

During perf/scale testing, we see OOM on ODL when a large number of neutron resources are created and deleted we see that ODL is killed due to OOM. Looking at the stdout when the JVM crashed due to OOM, we see:

Heap dump file created [3089813876 bytes in 19.232 secs]
Uncaught error from thread [opendaylight-cluster-data-shard-dispatcher-144] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[opendaylight-cluster-data]
java.lang.OutOfMemoryError: Java heap space
at com.google.common.collect.RegularImmutableMap.createEntryArray(RegularImmutableMap.java:148)
at com.google.common.collect.RegularImmutableMap.<init>(RegularImmutableMap.java:81)
at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:294)
at org.opendaylight.controller.cluster.datastore.persisted.FrontendHistoryMetadata.<init>(FrontendHistoryMetadata.java:40)
at org.opendaylight.controller.cluster.datastore.FrontendHistoryMetadataBuilder.build(FrontendHistoryMetadataBuilder.java:54)
at org.opendaylight.controller.cluster.datastore.FrontendClientMetadataBuilder$$Lambda$431/741495460.apply(Unknown Source)
at com.google.common.collect.Iterators$8.transform(Iterators.java:799)
at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
at java.util.AbstractCollection.toArray(AbstractCollection.java:141)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:258)
at org.opendaylight.controller.cluster.datastore.persisted.FrontendClientMetadata.<init>(FrontendClientMetadata.java:38)
at org.opendaylight.controller.cluster.datastore.FrontendClientMetadataBuilder.build(FrontendClientMetadataBuilder.java:77)
at org.opendaylight.controller.cluster.datastore.FrontendMetadata$$Lambda$430/2026307982.apply(Unknown Source)
at com.google.common.collect.Iterators$8.transform(Iterators.java:799)
at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
at java.util.AbstractCollection.toArray(AbstractCollection.java:141)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:258)
at org.opendaylight.controller.cluster.datastore.persisted.FrontendShardDataTreeSnapshotMetadata.<init>(FrontendShardDataTreeSnapshotMetadata.java:71)
at org.opendaylight.controller.cluster.datastore.FrontendMetadata.toSnapshot(FrontendMetadata.java:72)
at org.opendaylight.controller.cluster.datastore.FrontendMetadata.toSnapshot(FrontendMetadata.java:33)
at org.opendaylight.controller.cluster.datastore.ShardDataTree.takeStateSnapshot(ShardDataTree.java:216)
at org.opendaylight.controller.cluster.datastore.ShardSnapshotCohort.createSnapshot(ShardSnapshotCohort.java:68)
at org.opendaylight.controller.cluster.raft.RaftActorSnapshotMessageSupport.lambda$new$0(RaftActorSnapshotMessageSupport.java:52)
at org.opendaylight.controller.cluster.raft.RaftActorSnapshotMessageSupport$$Lambda$123/1533883683.accept(Unknown Source)
at org.opendaylight.controller.cluster.raft.SnapshotManager$Idle.capture(SnapshotManager.java:295)
at org.opendaylight.controller.cluster.raft.SnapshotManager$Idle.capture(SnapshotManager.java:307)
at org.opendaylight.controller.cluster.raft.SnapshotManager.capture(SnapshotManager.java:91)
at org.opendaylight.controller.cluster.raft.behaviors.Follower.lambda$handleAppendEntries$0(Follower.java:254)
at org.opendaylight.controller.cluster.raft.behaviors.Follower$$Lambda$127/742332312.apply(Unknown Source)
at org.opendaylight.controller.cluster.raft.ReplicatedLogImpl.lambda$appendAndPersist$0(ReplicatedLogImpl.java:111)
at org.opendaylight.controller.cluster.raft.ReplicatedLogImpl$$Lambda$128/559701765.apply(Unknown Source)
at akka.persistence.UntypedPersistentActor$$anonfun$persist$1.apply(PersistentActor.scala:206)



 Comments   
Comment by Michael Vorburger [ 13/Nov/17 ]

Attached java_pid1_Leak_Suspects.zip from smalleni environment shows this another one like CONTROLLER-1756 ...

Attached is the usual MAT produced Leak Suspects report, from a HPROF from 1st of 3 cluster nodes; the 2nd one show basically the same - 800 MB on 1 GB of total 2 GB due to huge Map in ShardDataTree.

PS: The 3rd node does an OOM with a slightly different HPROF, see attached controller-2__java_pid1_Leak_Suspects.zip, showing 745 MB used by the thread MonitorableThreadFactory$AkkaForkJoinWorkerThread local variables, and 310 MB in org.opendaylight.controller.cluster.datastore.FrontendMetadata. I don't know enough about clustering internals to judge if that is just an impact of above, but unless educated otherwise, will have to suspect it is so.

Comment by Michael Vorburger [ 13/Nov/17 ]

The Karaf log on controller-0 and controller-1 (both with the ShardDataTree OOM) do not contain "IllegalStateException: Attempted to close chain with outstanding transaction" as in CONTROLLER-1756, but a lot of other errors - too many to easily make any sense of them, so far...

Comment by Sai Sindhur Malleni [ 13/Nov/17 ]

Michael, shouldn't you be looking at logs on controller-1 and controller-2 since that is where the ODLs hit OOM.

Comment by Michael Vorburger [ 13/Nov/17 ]

smalleni yeah you're right, I'll pour more through controller-1 log... but FYI you're controller-0 WOULD HAVE (was about to) OOM as well anyway...

Comment by Michael Vorburger [ 13/Nov/17 ]

Grep'ing for "chain" in the attached karaf.log.xz from controller-1 reveals a lot of the following kinds messages again from openflowplugin which I have a hunch may have something to do with this:

1. "Transaction chain failed, recreating chain due to ..." :

2017-11-08 20:26:13,390 | WARN  | CommitFutures-1  | TransactionChainManager          | 304 - org.opendaylight.openflowplugin.common - 0.4.2.Carbon-redhat-2 | Transaction chain failed, recreating chain due to 
OptimisticLockFailedException{message=Optimistic lock failed., errorList=[RpcError [message=Optimistic lock failed., severity=ERROR, errorType=APPLICATION, tag=resource-denied, applicationTag=null, info=null, cause=org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node was created by other transaction.]]}
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.lambda$processNextPendingTransaction$0(ShardDataTree.java:751)
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPending(ShardDataTree.java:789)
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPendingTransaction(ShardDataTree.java:736)
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.startCanCommit(ShardDataTree.java:819)
        at org.opendaylight.controller.cluster.datastore.SimpleShardDataTreeCohort.canCommit(SimpleShardDataTreeCohort.java:90)
        at org.opendaylight.controller.cluster.datastore.ChainedCommitCohort.canCommit(ChainedCommitCohort.java:58)
(...)
Caused by: org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node was created by other transaction.
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkWriteApplicable(SchemaAwareApplyOperation.java:174)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:128)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:305)
(...)

2. "WARN: ConflictingModificationAppliedException" && "ERROR Exception during INITIAL transaction submitting. OptimisticLockFailedException" :

2017-11-09 13:28:33,002 | WARN  | lt-dispatcher-41 | ConcurrentDOMDataBroker          | 211 - org.opendaylight.controller.sal-distributed-datastore - 1.5.2.Carbon-redhat-2 | Tx: DOM-CHAIN-20-0 Error during phase CAN_COMMIT, starting Abort
OptimisticLockFailedException{message=Optimistic lock failed., errorList=[RpcError [message=Optimistic lock failed., severity=ERROR, errorType=APPLICATION, tag=resource-denied, applicationTag=null, info=null, cause=org.opendaylight.yangtools.yang.data.ap
i.schema.tree.ConflictingModificationAppliedException: Node children was modified by other transaction]]}
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.lambda$processNextPendingTransaction$0(ShardDataTree.java:751)
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPending(ShardDataTree.java:789)
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPendingTransaction(ShardDataTree.java:736)
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.startCanCommit(ShardDataTree.java:819)
        at org.opendaylight.controller.cluster.datastore.SimpleShardDataTreeCohort.canCommit(SimpleShardDataTreeCohort.java:90)
        at org.opendaylight.controller.cluster.datastore.ChainedCommitCohort.canCommit(ChainedCommitCohort.java:58)
        at org.opendaylight.controller.cluster.datastore.CohortEntry.canCommit(CohortEntry.java:97)
        at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleCanCommit(ShardCommitCoordinator.java:236)
        at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleForwardedReadyTransaction(ShardCommitCoordinator.java:114)
        at org.opendaylight.controller.cluster.datastore.Shard.handleForwardedReadyTransaction(Shard.java:699)
        at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:314)
        at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:270)
        at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:44)
        at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:170)
        at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:104)
        at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
        at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:168)
        at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:727)
        at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:183)
        at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:168)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
        at akka.actor.ActorCell.invoke(ActorCell.scala:495)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
        at akka.dispatch.Mailbox.run(Mailbox.scala:224)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node children was modified by other transaction
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkConflicting(SchemaAwareApplyOperation.java:78)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkNotConflicting(SchemaAwareApplyOperation.java:108)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkWriteApplicable(SchemaAwareApplyOperation.java:170)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:128)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:305)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkMergeApplicable(AbstractNodeContainerModificationStrategy.java:313)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:131)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:305)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkTouchApplicable(AbstractNodeContainerModificationStrategy.java:288)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:125)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.StructuralContainerModificationStrategy.checkApplicable(StructuralContainerModificationStrategy.java:99)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:305)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkTouchApplicable(AbstractNodeContainerModificationStrategy.java:288)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:125)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.RootModificationApplyOperation.checkApplicable(RootModificationApplyOperation.java:72)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractDataTreeTip.validate(AbstractDataTreeTip.java:35)
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.lambda$processNextPendingTransaction$0(ShardDataTree.java:743)
        ... 29 more
2017-11-09 13:28:33,003 | ERROR | ntLoopGroup-7-36 | TransactionChainManager          | 304 - org.opendaylight.openflowplugin.common - 0.4.2.Carbon-redhat-2 | Exception during INITIAL transaction submitting. 
java.util.concurrent.ExecutionException: OptimisticLockFailedException{message=Optimistic lock failed., errorList=[RpcError [message=Optimistic lock failed., severity=ERROR, errorType=APPLICATION, tag=resource-denied, applicationTag=null, info=null, cause=org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node children was modified by other transaction]]}
        at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
        at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:272)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:96)
        at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:69)
        at org.opendaylight.openflowplugin.common.txchain.TransactionChainManager.submitTransaction(TransactionChainManager.java:180)
        at org.opendaylight.openflowplugin.common.txchain.TransactionChainManager.initialSubmitWriteTransaction(TransactionChainManager.java:85)
        at org.opendaylight.openflowplugin.impl.device.DeviceContextImpl.initialSubmitTransaction(DeviceContextImpl.java:241)
        at org.opendaylight.openflowplugin.impl.statistics.StatisticsContextImpl$4.onSuccess(StatisticsContextImpl.java:363)
        at org.opendaylight.openflowplugin.impl.statistics.StatisticsContextImpl$4.onSuccess(StatisticsContextImpl.java:355)
        at com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
        at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
        at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:185)
        at com.google.common.util.concurrent.Futures$ChainingListenableFuture$1.run(Futures.java:918)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
        at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
        at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:185)
        at com.google.common.util.concurrent.Futures$ChainingListenableFuture$1.run(Futures.java:918)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
        at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
        at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:185)
        at com.google.common.util.concurrent.Futures$ChainingListenableFuture$1.run(Futures.java:918)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
        at com.google.common.util.concurrent.Futures$ImmediateFuture.addListener(Futures.java:106)
        at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:914)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
        at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
        at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:185)
        at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:53)
        at org.opendaylight.openflowplugin.impl.rpc.AbstractRequestContext.setResult(AbstractRequestContext.java:32)
        at org.opendaylight.openflowplugin.impl.device.listener.MultiMsgCollectorImpl.endCollecting(MultiMsgCollectorImpl.java:70)
        at org.opendaylight.openflowplugin.impl.device.listener.MultiMsgCollectorImpl.addMultipartMsg(MultiMsgCollectorImpl.java:58)
        at org.opendaylight.openflowplugin.impl.services.AbstractMultipartRequestCallback.onSuccess(AbstractMultipartRequestCallback.java:54)
        at org.opendaylight.openflowplugin.impl.services.AbstractMultipartRequestCallback.onSuccess(AbstractMultipartRequestCallback.java:22)
        at org.opendaylight.openflowjava.protocol.impl.core.connection.OutboundQueueEntry.complete(OutboundQueueEntry.java:121)
        at org.opendaylight.openflowjava.protocol.impl.core.connection.StackedSegment.completeEntry(StackedSegment.java:113)
        at org.opendaylight.openflowjava.protocol.impl.core.connection.StackedSegment.pairRequest(StackedSegment.java:153)
        at org.opendaylight.openflowjava.protocol.impl.core.connection.AbstractStackedOutboundQueue.pairRequest(AbstractStackedOutboundQueue.java:203)
        at org.opendaylight.openflowjava.protocol.impl.core.connection.AbstractOutboundQueueManager.onMessage(AbstractOutboundQueueManager.java:208)
        at org.opendaylight.openflowjava.protocol.impl.core.connection.ConnectionAdapterImpl.consumeDeviceMessage(ConnectionAdapterImpl.java:143)
        at org.opendaylight.openflowjava.protocol.impl.core.connection.AbstractConnectionAdapterStatistics.consume(AbstractConnectionAdapterStatistics.java:66)
        at org.opendaylight.openflowjava.protocol.impl.core.connection.ConnectionAdapterImpl.consume(ConnectionAdapterImpl.java:44)
        at org.opendaylight.openflowjava.protocol.impl.core.DelegatingInboundHandler.channelRead(DelegatingInboundHandler.java:43)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
        at org.opendaylight.openflowjava.protocol.impl.core.connection.AbstractOutboundQueueManager.channelRead(AbstractOutboundQueueManager.java:195)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
        at org.opendaylight.openflowjava.protocol.impl.core.IdleHandler.channelRead(IdleHandler.java:39)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:642)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:565)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:479)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:433)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
        at java.lang.Thread.run(Thread.java:748)
Caused by: OptimisticLockFailedException{message=Optimistic lock failed., errorList=[RpcError [message=Optimistic lock failed., severity=ERROR, errorType=APPLICATION, tag=resource-denied, applicationTag=null, info=null, cause=org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node children was modified by other transaction]]}
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.lambda$processNextPendingTransaction$0(ShardDataTree.java:751)
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPending(ShardDataTree.java:789)
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.processNextPendingTransaction(ShardDataTree.java:736)
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.startCanCommit(ShardDataTree.java:819)
        at org.opendaylight.controller.cluster.datastore.SimpleShardDataTreeCohort.canCommit(SimpleShardDataTreeCohort.java:90)
        at org.opendaylight.controller.cluster.datastore.ChainedCommitCohort.canCommit(ChainedCommitCohort.java:58)
        at org.opendaylight.controller.cluster.datastore.CohortEntry.canCommit(CohortEntry.java:97)
        at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleCanCommit(ShardCommitCoordinator.java:236)
        at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleForwardedReadyTransaction(ShardCommitCoordinator.java:114)
        at org.opendaylight.controller.cluster.datastore.Shard.handleForwardedReadyTransaction(Shard.java:699)
        at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:314)
        at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:270)
        at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:44)
        at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:170)
        at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:104)
        at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
        at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:168)
        at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:727)
        at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:183)
        at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:168)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
        at akka.actor.ActorCell.invoke(ActorCell.scala:495)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
        at akka.dispatch.Mailbox.run(Mailbox.scala:224)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node children was modified by other transaction
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkConflicting(SchemaAwareApplyOperation.java:78)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkNotConflicting(SchemaAwareApplyOperation.java:108)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkWriteApplicable(SchemaAwareApplyOperation.java:170)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:128)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:305)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkMergeApplicable(AbstractNodeContainerModificationStrategy.java:313)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:131)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:305)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkTouchApplicable(AbstractNodeContainerModificationStrategy.java:288)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:125)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.StructuralContainerModificationStrategy.checkApplicable(StructuralContainerModificationStrategy.java:99)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkChildPreconditions(AbstractNodeContainerModificationStrategy.java:305)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractNodeContainerModificationStrategy.checkTouchApplicable(AbstractNodeContainerModificationStrategy.java:288)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.SchemaAwareApplyOperation.checkApplicable(SchemaAwareApplyOperation.java:125)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.RootModificationApplyOperation.checkApplicable(RootModificationApplyOperation.java:72)
        at org.opendaylight.yangtools.yang.data.impl.schema.tree.AbstractDataTreeTip.validate(AbstractDataTreeTip.java:35)
        at org.opendaylight.controller.cluster.datastore.ShardDataTree.lambda$processNextPendingTransaction$0(ShardDataTree.java:743)
2017-11-09 13:28:33,003 | WARN  | ntLoopGroup-7-36 | ContextChainHolderImpl           | 307 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon-redhat-2 | Not able to set MASTER role on device openflow:123674267060747, reason: Initial transaction cannot be submitted.
2017-11-09 13:28:33,003 | WARN  | ntLoopGroup-7-36 | ContextChainHolderImpl           | 307 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon-redhat-2 | This mastering is mandatory, destroying context chain and closing connection for device openflow:123674267060747.
2017-11-09 13:28:33,003 | INFO  | ntLoopGroup-7-36 | ContextChainImpl                 | 307 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon-redhat-2 | Closed clustering services registration for node openflow:123674267060747

3. "Failed to shut down transaction chain for device" (more in 4.):

2017-11-09 13:28:33,004 | WARN  | ntLoopGroup-7-36 | DeviceContextImpl                | 307 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon-redhat-2 | Failed to shut down transaction chain for device openflow:123674267060747: java.util.concurrent.ExecutionException: OptimisticLockFailedException{message=Optimistic lock failed., errorList=[RpcError [message=Optimistic lock failed., severity=ERROR, errorType=APPLICATION, tag=resource-denied, applicationTag=null, info=null, cause=org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node children was modified by other transaction]]}
Comment by Michael Vorburger [ 13/Nov/17 ]

4. "ShardLeaderNotRespondingException: Could not create a READ_WRITE transaction on shard default. The shard leader isn't responding." & "Failed to shut down transaction chain for device openflow:":

2017-11-09 20:14:30,886 | ERROR | lt-dispatcher-46 | TransactionChainProxy            | 211 - org.opendaylight.controller.sal-distributed-datastore - 1.5.2.Carbon-redhat-2 | Tx: member-1-datastore-operational-fe-1-chn-122-txn-86-0 - ready future failed f
or previous Tx member-1-datastore-operational-fe-1-chn-122-txn-86-0
2017-11-09 20:14:30,886 | WARN  | lt-dispatcher-78 | ConcurrentDOMDataBroker          | 211 - org.opendaylight.controller.sal-distributed-datastore - 1.5.2.Carbon-redhat-2 | Tx: DOM-CHAIN-121-85 Error during phase CAN_COMMIT, starting Abort
akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@172.16.0.22:2550/), Path(/user/shardmanager-operational/member-0-shard-default-operational/shard-default-member-1:datastore-operational@1:122-8
5_4948293#1836430803)]] after [30000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.datastore.messages.BatchedModifications".
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)[185:com.typesafe.akka.actor:2.4.18]
        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)[185:com.typesafe.akka.actor:2.4.18]
        at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)[185:com.typesafe.akka.actor:2.4.18]
        at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)[185:com.typesafe.akka.actor:2.4.18]
        at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)[185:com.typesafe.akka.actor:2.4.18]
        at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)[185:com.typesafe.akka.actor:2.4.18]
        at java.lang.Thread.run(Thread.java:748)[:1.8.0_151]
2017-11-09 20:14:30,886 | WARN  | CommitFutures-64 | TransactionChainManager          | 304 - org.opendaylight.openflowplugin.common - 0.4.2.Carbon-redhat-2 | Transaction chain failed, recreating chain due to 
TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@172.16.0.22:2550/), Path(/user/shardmanager-operational/member-0-shard-default-operational/shard-default-member-1:datastore-operational@1:122-85_4948293#1836430803)]] after [30000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.datastore.messages.BatchedModifications".]]}
        at org.opendaylight.controller.md.sal.dom.broker.impl.TransactionCommitFailedExceptionMapper.newWithCause(TransactionCommitFailedExceptionMapper.java:37)[171:org.opendaylight.controller.sal-broker-impl:1.5.2.Carbon-redhat-2]
        at org.opendaylight.controller.md.sal.dom.broker.impl.TransactionCommitFailedExceptionMapper.newWithCause(TransactionCommitFailedExceptionMapper.java:18)[171:org.opendaylight.controller.sal-broker-impl:1.5.2.Carbon-redhat-2]
        at org.opendaylight.yangtools.util.concurrent.ExceptionMapper.apply(ExceptionMapper.java:102)[69:org.opendaylight.yangtools.util:1.1.2.Carbon-redhat-2]
        at org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker.handleException(ConcurrentDOMDataBroker.java:231)[211:org.opendaylight.controller.sal-distributed-datastore:1.5.2.Carbon-redhat-2]
        at org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker.access$000(ConcurrentDOMDataBroker.java:46)[211:org.opendaylight.controller.sal-distributed-datastore:1.5.2.Carbon-redhat-2]
        at org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker$1.onFailure(ConcurrentDOMDataBroker.java:126)[211:org.opendaylight.controller.sal-distributed-datastore:1.5.2.Carbon-redhat-2]
        at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)[65:com.google.guava:18.0.0]
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)[65:com.google.guava:18.0.0]
        at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)[65:com.google.guava:18.0.0]
        at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)[65:com.google.guava:18.0.0]
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)[65:com.google.guava:18.0.0]
        at com.google.common.util.concurrent.SettableFuture.setException(SettableFuture.java:68)[65:com.google.guava:18.0.0]
        at org.opendaylight.controller.cluster.datastore.SingleCommitCohortProxy$1.onComplete(SingleCommitCohortProxy.java:58)[211:org.opendaylight.controller.sal-distributed-datastore:1.5.2.Carbon-redhat-2]
        at akka.dispatch.OnComplete.internal(Future.scala:258)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.OnComplete.internal(Future.scala:256)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)[185:com.typesafe.akka.actor:2.4.18]
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)[185:com.typesafe.akka.actor:2.4.18]
        at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)[185:com.typesafe.akka.actor:2.4.18]
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@172.16.0.22:2550/), Path(/user/shardmanager-operational/member-0-shard-default-operational/shard-default-member-1:datastore-operational@1:122-85_4948293#1836430803)]] after [30000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.datastore.messages.BatchedModifications".
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)[185:com.typesafe.akka.actor:2.4.18]
        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)[185:com.typesafe.akka.actor:2.4.18]
        at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)[185:com.typesafe.akka.actor:2.4.18]
        at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)[185:com.typesafe.akka.actor:2.4.18]
        at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)[185:com.typesafe.akka.actor:2.4.18]
        at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)[185:com.typesafe.akka.actor:2.4.18]
        at java.lang.Thread.run(Thread.java:748)[:1.8.0_151]
2017-11-09 20:14:30,886 | ERROR | CommitFutures-64 | TransactionChainManager          | 304 - org.opendaylight.openflowplugin.common - 0.4.2.Carbon-redhat-2 | Transaction commit failed. 
TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@172.16.0.22:2550/), Path(/user/shardmanager-operational/member-0-shard-default-operational/shard-default-member-1:datastore-operational@1:1
122-85_4948293#1836430803)]] after [30000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.datastore.messages.BatchedModifications".]]}
        at org.opendaylight.controller.md.sal.dom.broker.impl.TransactionCommitFailedExceptionMapper.newWithCause(TransactionCommitFailedExceptionMapper.java:37)[171:org.opendaylight.controller.sal-broker-impl:1.5.2.Carbon-redhat-2]
        at org.opendaylight.controller.md.sal.dom.broker.impl.TransactionCommitFailedExceptionMapper.newWithCause(TransactionCommitFailedExceptionMapper.java:18)[171:org.opendaylight.controller.sal-broker-impl:1.5.2.Carbon-redhat-2]
        at org.opendaylight.yangtools.util.concurrent.ExceptionMapper.apply(ExceptionMapper.java:102)[69:org.opendaylight.yangtools.util:1.1.2.Carbon-redhat-2]
        at org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker.handleException(ConcurrentDOMDataBroker.java:231)[211:org.opendaylight.controller.sal-distributed-datastore:1.5.2.Carbon-redhat-2]
        at org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker.access$000(ConcurrentDOMDataBroker.java:46)[211:org.opendaylight.controller.sal-distributed-datastore:1.5.2.Carbon-redhat-2]
        at org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker$1.onFailure(ConcurrentDOMDataBroker.java:126)[211:org.opendaylight.controller.sal-distributed-datastore:1.5.2.Carbon-redhat-2]
        at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)[65:com.google.guava:18.0.0]
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)[65:com.google.guava:18.0.0]
        at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)[65:com.google.guava:18.0.0]
        at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)[65:com.google.guava:18.0.0]
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)[65:com.google.guava:18.0.0]
        at com.google.common.util.concurrent.SettableFuture.setException(SettableFuture.java:68)[65:com.google.guava:18.0.0]
        at org.opendaylight.controller.cluster.datastore.SingleCommitCohortProxy$1.onComplete(SingleCommitCohortProxy.java:58)[211:org.opendaylight.controller.sal-distributed-datastore:1.5.2.Carbon-redhat-2]
        at akka.dispatch.OnComplete.internal(Future.scala:258)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.OnComplete.internal(Future.scala:256)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)[185:com.typesafe.akka.actor:2.4.18]
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)[185:com.typesafe.akka.actor:2.4.18]
        at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)[185:com.typesafe.akka.actor:2.4.18]
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@172.16.0.22:2550/), Path(/user/shardmanager-operational/member-0-shard-default-operational/shard-default-member-1:datastore-operational@1:122-85_4948293#1836430803)]] after [30000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.datastore.messages.BatchedModifications".
:
        at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)[185:com.typesafe.akka.actor:2.4.18]
        at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)[185:com.typesafe.akka.actor:2.4.18]
        at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)[185:com.typesafe.akka.actor:2.4.18]
        at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)[185:com.typesafe.akka.actor:2.4.18]
        at java.lang.Thread.run(Thread.java:748)[:1.8.0_151]
2017-11-09 20:14:30,886 | ERROR | CommitFutures-64 | TransactionChainManager          | 304 - org.opendaylight.openflowplugin.common - 0.4.2.Carbon-redhat-2 | Transaction commit failed. 
TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@172.16.0.22:2550/), Path(/user/shardmanager-operational/member-0-shard-default-operational/shard-default-member-1:datastore-operational@1:122-85_4948293#1836430803)]] after [30000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.datastore.messages.BatchedModifications".]]}
        at org.opendaylight.controller.md.sal.dom.broker.impl.TransactionCommitFailedExceptionMapper.newWithCause(TransactionCommitFailedExceptionMapper.java:37)[171:org.opendaylight.controller.sal-broker-impl:1.5.2.Carbon-redhat-2]
        at org.opendaylight.controller.md.sal.dom.broker.impl.TransactionCommitFailedExceptionMapper.newWithCause(TransactionCommitFailedExceptionMapper.java:18)[171:org.opendaylight.controller.sal-broker-impl:1.5.2.Carbon-redhat-2]
        at org.opendaylight.yangtools.util.concurrent.ExceptionMapper.apply(ExceptionMapper.java:102)[69:org.opendaylight.yangtools.util:1.1.2.Carbon-redhat-2]
        at org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker.handleException(ConcurrentDOMDataBroker.java:231)[211:org.opendaylight.controller.sal-distributed-datastore:1.5.2.Carbon-redhat-2]
        at org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker.access$000(ConcurrentDOMDataBroker.java:46)[211:org.opendaylight.controller.sal-distributed-datastore:1.5.2.Carbon-redhat-2]
        at org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker$1.onFailure(ConcurrentDOMDataBroker.java:126)[211:org.opendaylight.controller.sal-distributed-datastore:1.5.2.Carbon-redhat-2]
        at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)[65:com.google.guava:18.0.0]
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)[65:com.google.guava:18.0.0]
        at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)[65:com.google.guava:18.0.0]
        at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)[65:com.google.guava:18.0.0]
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)[65:com.google.guava:18.0.0]
        at com.google.common.util.concurrent.SettableFuture.setException(SettableFuture.java:68)[65:com.google.guava:18.0.0]
        at org.opendaylight.controller.cluster.datastore.SingleCommitCohortProxy$1.onComplete(SingleCommitCohortProxy.java:58)[211:org.opendaylight.controller.sal-distributed-datastore:1.5.2.Carbon-redhat-2]
        at akka.dispatch.OnComplete.internal(Future.scala:258)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.OnComplete.internal(Future.scala:256)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)[185:com.typesafe.akka.actor:2.4.18]
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)[181:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
        at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)[185:com.typesafe.akka.actor:2.4.18]
        at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)[185:com.typesafe.akka.actor:2.4.18]
(...)
2017-11-09 20:14:32,467 | WARN  | lt-dispatcher-19 | ConcurrentDOMDataBroker          | 211 - org.opendaylight.controller.sal-distributed-datastore - 1.5.2.Carbon-redhat-2 | Tx: DOM-CHAIN-171-0 Error during phase CAN_COMMIT, starting Abort
org.opendaylight.controller.cluster.datastore.exceptions.ShardLeaderNotRespondingException: Could not create a READ_WRITE transaction on shard default. The shard leader isn't responding.
        at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.createTransactionContext(RemoteTransactionContextSupport.java:213)[211:org.opendaylight.controller.sal-distributed-datastore:1.5.2.Carbon-redhat-2]
        at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.onCreateTransactionComplete(RemoteTransactionContextSupport.java:195)[211:org.opendaylight.controller.sal-distributed-datastore:1.5.2.Carbon-redhat-2]
        at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport.access$000(RemoteTransactionContextSupport.java:39)[211:org.opendaylight.controller.sal-distributed-datastore:1.5.2.Carbon-redhat-2]
        at org.opendaylight.controller.cluster.datastore.RemoteTransactionContextSupport$1.onComplete(RemoteTransactionContextSupport.java:135)[211:org.opendaylight.controller.sal-distributed-datastore:1.5.2.Carbon-redhat-2]
(...)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@172.16.0.22:2550/), Path(/user/shardmanager-operational/member-0-shard-default-operational)]] after [5000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.datastore.messages.CreateTransaction".
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)[185:com.typesafe.akka.actor:2.4.18]
(...)
2017-11-09 20:14:32,467 | WARN  | CommitFutures-66 | PingPongTransactionChain         | 171 - org.opendaylight.controller.sal-broker-impl - 1.5.2.Carbon-redhat-2 | Transaction chain org.opendaylight.controller.cluster.databroker.DOMBrokerTransactionChain@5770b26 failed with no pending transactions
(...)
2017-11-09 20:14:34,170 | WARN  | ntLoopGroup-7-40 | DeviceContextImpl                | 307 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon-redhat-2 | Failed to shut down transaction chain for device openflow:123674267060747: java.util.concurrent.TimeoutException: Timeout waiting for task.

I'll reach out to openflowplugin-dev & controller-dev again to see if any of these means anything to them...

Comment by Michael Vorburger [ 15/Nov/17 ]

> I'll reach out to openflowplugin-dev & controller-dev again to see if any of these means anything to them...

On https://lists.opendaylight.org/pipermail/controller-dev/2017-November/014064.html rovarga says "a large FrontendMetadata part could indicate dangling
transactions or similar".

The attached open-transactions_controller-*.txt, produced by the trace:transaction CLI Command (see CONTROLLER-1760), identify a number of "transaction leaks". I had raised at least some of them already in NETVIRT-883, and will take this issue as to opportunity to fix as many of them as I can (with reasonable effort), hoping that ths will help fix this problem... I'm NOT 100% sure it will, but probably worth a try getting them in and then re-testing this (and otherwise it's useful clean up to do, anyway).

Comment by Michael Vorburger [ 15/Nov/17 ]

https://git.opendaylight.org/gerrit/#/q/topic:NETVIRT-985 ...

Comment by Michael Vorburger [ 17/Nov/17 ]

Once everything that is open now, PLUS what skitt will help me with in NETVIRT-1000, are available in stable/carbon, then smalleni can retest. (Full disclosure: There are few minor Tx leaks, showing up with much fewer transactions than the biggies, which we'll fix only on master, not on carbon & nitrogen.)

Comment by Michael Vorburger [ 28/Nov/17 ]

Everything done here is now merged to master & carbon; and nitrogen should get merged in the coming days (stable/nitrogen was re-open'd today). In attached Controller[1-3]_open-transactions.txt from latest test with a build including these fixes shows, we can clearly see that all the "big" leaks (as in with hundreds of open TXs) have been plugged; as expected. Preliminary early feedback indicates that they are not hitting an OOM anymore.

 
There ARE still some open TX, but relatively few, compared to where we started; I don't think it's worth following up on them, in the short term. In case we ever want to look at this again in the future, see new attachments.

Comment by Michael Vorburger [ 29/Nov/17 ]

Testing of another scenario ("nova boot") has hit an OOM that looks exactly this again (another huge 857 MB of 1.6 GB Map inside the MD SAL ShardDataTree), so clogging these TX leaks is likely going to be more of an ongoing repetitive than a one time action..

Basically, any time we test new scenarios at scale on a path that hasn't been threaded before, if we hit an OOM that shows a blown up ShardDataTree, we have to run trace:transactions and clog more non-closed TXs - best by using the ManagedTransactionRunner, continuing on the repetitive pattern of fixes which the patches of the past 2 weeks on the https://git.opendaylight.org/gerrit/#/q/topic:NETVIRT-985 have shown.

Comment by Michael Vorburger [ 30/Nov/17 ]

Closing this issue to avoid confusing, carrying on more similar work in different places in OVSDB-435.

Generated at Wed Feb 07 20:22:57 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.