[OPNFLWPLUG-863] ERRORs during the OpenFlow cluster performance test Created: 06/Mar/17  Updated: 27/Sep/21  Resolved: 30/Mar/17

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Luis Gomez Assignee: Robert Varga
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 7901

 Description   

We have OF cluster perf test (carbon) failing often (stats not properly collected).

The test uses REST script to inject 64K flows in 32 switches (2K flows/switch) in bulks of 200 flows/REST request. REST requests are sent to inventory config follower while switches are connected to the other follower (3 node cluster).

Following errors are seen in the instance connecting the switches:

2017-03-02 12:14:59,066 | ERROR | lt-dispatcher-47 | EndpointWriter | 199 - com.typesafe.akka.slf4j - 2.4.17 | Failed to serialize remote message [class akka.actor.ActorSelectionMessage] using serializer [class akka.remote.serialization.MessageContainerSerializer]. Transient association error (association remains live)
akka.remote.MessageSerializer$SerializationException: Failed to serialize remote message [class akka.actor.ActorSelectionMessage] using serializer [class akka.remote.serialization.MessageContainerSerializer].
at akka.remote.MessageSerializer$.serialize(MessageSerializer.scala:61)[209:com.typesafe.akka.remote:2.4.17]
at akka.remote.EndpointWriter$$anonfun$serializeMessage$1.apply(Endpoint.scala:895)[209:com.typesafe.akka.remote:2.4.17]
at akka.remote.EndpointWriter$$anonfun$serializeMessage$1.apply(Endpoint.scala:895)[209:com.typesafe.akka.remote:2.4.17]
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)[194:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at akka.remote.EndpointWriter.serializeMessage(Endpoint.scala:894)[209:com.typesafe.akka.remote:2.4.17]
at akka.remote.EndpointWriter.writeSend(Endpoint.scala:786)[209:com.typesafe.akka.remote:2.4.17]
at akka.remote.EndpointWriter$$anonfun$4.applyOrElse(Endpoint.scala:761)[209:com.typesafe.akka.remote:2.4.17]
at akka.actor.Actor$class.aroundReceive(Actor.scala:497)[198:com.typesafe.akka.actor:2.4.17]
at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:452)[209:com.typesafe.akka.remote:2.4.17]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)[198:com.typesafe.akka.actor:2.4.17]
at akka.actor.ActorCell.invoke(ActorCell.scala:495)[198:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)[198:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.run(Mailbox.scala:224)[198:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[198:com.typesafe.akka.actor:2.4.17]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[194:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[194:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[194:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[194:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
Caused by: java.lang.NullPointerException
at org.opendaylight.controller.cluster.datastore.modification.MutableCompositeModification.writeExternal(MutableCompositeModification.java:120)[232:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.messages.BatchedModifications.writeExternal(BatchedModifications.java:78)[232:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)[:1.8.0_121]
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)[:1.8.0_121]
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)[:1.8.0_121]
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)[:1.8.0_121]
at akka.serialization.JavaSerializer$$anonfun$toBinary$1.apply$mcV$sp(Serializer.scala:321)[198:com.typesafe.akka.actor:2.4.17]
at akka.serialization.JavaSerializer$$anonfun$toBinary$1.apply(Serializer.scala:321)[198:com.typesafe.akka.actor:2.4.17]
at akka.serialization.JavaSerializer$$anonfun$toBinary$1.apply(Serializer.scala:321)[198:com.typesafe.akka.actor:2.4.17]
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)[194:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at akka.serialization.JavaSerializer.toBinary(Serializer.scala:321)[198:com.typesafe.akka.actor:2.4.17]
at akka.remote.serialization.MessageContainerSerializer.serializeSelection(MessageContainerSerializer.scala:45)[209:com.typesafe.akka.remote:2.4.17]
at akka.remote.serialization.MessageContainerSerializer.toBinary(MessageContainerSerializer.scala:34)[209:com.typesafe.akka.remote:2.4.17]
at akka.remote.MessageSerializer$.serialize(MessageSerializer.scala:47)[209:com.typesafe.akka.remote:2.4.17]
... 17 more
2017-03-02 12:14:59,194 | ERROR | lt-dispatcher-18 | TransactionChainProxy | 232 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Tx: member-1-datastore-operational-fe-0-chn-22-txn-16-0 - ready future failed for previous Tx member-1-datastore-operational-fe-0-chn-22-txn-16-0
2017-03-02 12:14:59,199 | ERROR | CommitFutures-5 | ExecutionList | 84 - com.google.guava - 18.0.0 | RuntimeException while executing runnable com.google.common.util.concurrent.Futures$6@15e5421d with executor INSTANCE
java.lang.IllegalStateException: Attempted to close chain with outstanding transaction PingPongTransaction

{delegate=org.opendaylight.controller.cluster.databroker.DOMBrokerReadWriteTransaction@1b22af09}

at com.google.common.base.Preconditions.checkState(Preconditions.java:197)[84:com.google.guava:18.0.0]
at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.close(PingPongTransactionChain.java:373)[184:org.opendaylight.controller.sal-broker-impl:1.5.0.SNAPSHOT]
at org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter.close(BindingDOMTransactionChainAdapter.java:115)[186:org.opendaylight.controller.sal-binding-broker-impl:1.5.0.SNAPSHOT]
at java.util.Optional.ifPresent(Optional.java:159)[:1.8.0_121]
at org.opendaylight.openflowplugin.impl.device.TransactionChainManager.createTxChain(TransactionChainManager.java:84)[248:org.opendaylight.openflowplugin.impl:0.4.0.SNAPSHOT]
at org.opendaylight.openflowplugin.impl.device.TransactionChainManager.recreateTxChain(TransactionChainManager.java:250)[248:org.opendaylight.openflowplugin.impl:0.4.0.SNAPSHOT]
at org.opendaylight.openflowplugin.impl.device.TransactionChainManager.onTransactionChainFailed(TransactionChainManager.java:239)[248:org.opendaylight.openflowplugin.impl:0.4.0.SNAPSHOT]
at org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter.failTransactionChain(BindingDOMTransactionChainAdapter.java:110)[186:org.opendaylight.controller.sal-binding-broker-impl:1.5.0.SNAPSHOT]
at org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter.access$200(BindingDOMTransactionChainAdapter.java:31)[186:org.opendaylight.controller.sal-binding-broker-impl:1.5.0.SNAPSHOT]
at org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter$3.onFailure(BindingDOMTransactionChainAdapter.java:91)[186:org.opendaylight.controller.sal-binding-broker-impl:1.5.0.SNAPSHOT]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)[84:com.google.guava:18.0.0]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)[84:com.google.guava:18.0.0]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)[84:com.google.guava:18.0.0]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)[84:com.google.guava:18.0.0]
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)[84:com.google.guava:18.0.0]
at com.google.common.util.concurrent.SettableFuture.setException(SettableFuture.java:68)[84:com.google.guava:18.0.0]
at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransaction.onFailure(PingPongTransaction.java:65)[184:org.opendaylight.controller.sal-broker-impl:1.5.0.SNAPSHOT]
at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.transactionFailed(PingPongTransactionChain.java:293)[184:org.opendaylight.controller.sal-broker-impl:1.5.0.SNAPSHOT]
at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain$2.onFailure(PingPongTransactionChain.java:243)[184:org.opendaylight.controller.sal-broker-impl:1.5.0.SNAPSHOT]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)[84:com.google.guava:18.0.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)[:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)[:1.8.0_121]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_121]
2017-03-02 12:14:59,202 | ERROR | CommitFutures-5 | TransactionChainManager | 248 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | Transaction commit failed.
TransactionCommitFailedException

{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=java.lang.IllegalStateException: The total number of batched messages received 4 does not match the number sent 6]]}

at org.opendaylight.controller.md.sal.dom.broker.impl.TransactionCommitFailedExceptionMapper.newWithCause(TransactionCommitFailedExceptionMapper.java:37)
at org.opendaylight.controller.md.sal.dom.broker.impl.TransactionCommitFailedExceptionMapper.newWithCause(TransactionCommitFailedExceptionMapper.java:18)
at org.opendaylight.yangtools.util.concurrent.ExceptionMapper.apply(ExceptionMapper.java:102)
at org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker.handleException(ConcurrentDOMDataBroker.java:222)
at org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker.access$000(ConcurrentDOMDataBroker.java:46)
at org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker$1.onFailure(ConcurrentDOMDataBroker.java:126)
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
at com.google.common.util.concurrent.SettableFuture.setException(SettableFuture.java:68)
at org.opendaylight.controller.cluster.datastore.SingleCommitCohortProxy$1.onComplete(SingleCommitCohortProxy.java:58)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalStateException: The total number of batched messages received 4 does not match the number sent 6
at org.opendaylight.controller.cluster.datastore.ShardWriteTransaction.batchedModifications(ShardWriteTransaction.java:75)
at org.opendaylight.controller.cluster.datastore.ShardWriteTransaction.handleReceive(ShardWriteTransaction.java:48)
at org.opendaylight.controller.cluster.datastore.ShardReadWriteTransaction.handleReceive(ShardReadWriteTransaction.java:34)
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedActor.onReceive(AbstractUntypedActor.java:28)
at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:104)
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)
at akka.actor.Actor$class.aroundReceive(Actor.scala:497)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
... 4 more
2017-03-02 12:14:59,207 | ERROR | CommitFutures-5 | TransactionChainManager | 248 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | Transaction commit failed.
TransactionCommitFailedException

{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=java.lang.IllegalStateException: The total number of batched messages received 4 does not match the number sent 6]]}

at org.opendaylight.controller.md.sal.dom.broker.impl.TransactionCommitFailedExceptionMapper.newWithCause(TransactionCommitFailedExceptionMapper.java:37)
at org.opendaylight.controller.md.sal.dom.broker.impl.TransactionCommitFailedExceptionMapper.newWithCause(TransactionCommitFailedExceptionMapper.java:18)
at org.opendaylight.yangtools.util.concurrent.ExceptionMapper.apply(ExceptionMapper.java:102)
at org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker.handleException(ConcurrentDOMDataBroker.java:222)
at org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker.access$000(ConcurrentDOMDataBroker.java:46)
at org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker$1.onFailure(ConcurrentDOMDataBroker.java:126)
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
at com.google.common.util.concurrent.SettableFuture.setException(SettableFuture.java:68)
at org.opendaylight.controller.cluster.datastore.SingleCommitCohortProxy$1.onComplete(SingleCommitCohortProxy.java:58)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalStateException: The total number of batched messages received 4 does not match the number sent 6
at org.opendaylight.controller.cluster.datastore.ShardWriteTransaction.batchedModifications(ShardWriteTransaction.java:75)
at org.opendaylight.controller.cluster.datastore.ShardWriteTransaction.handleReceive(ShardWriteTransaction.java:48)
at org.opendaylight.controller.cluster.datastore.ShardReadWriteTransaction.handleReceive(ShardReadWriteTransaction.java:34)
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedActor.onReceive(AbstractUntypedActor.java:28)
at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:104)
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)
at akka.actor.Actor$class.aroundReceive(Actor.scala:497)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
... 4 more



 Comments   
Comment by Luis Gomez [ 06/Mar/17 ]

Adding the failing job:

https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-restconf-clustering-perf-daily-only-carbon/

Comment by Tom Pantelis [ 06/Mar/17 ]

Submitted https://git.opendaylight.org/gerrit/#/c/52883/ for

java.lang.NullPointerException
at org.opendaylight.controller.cluster.datastore.modification.MutableCompositeModification.writeExternal(MutableCompositeModification.java:120)

Comment by Tom Pantelis [ 06/Mar/17 ]

After looking at this some more, I believe the serialization NPE and subsequent "The total number of batched messages received 4 does not match the number sent 6" error are a result of the client accessing the transaction concurrently (see my comment in https://git.opendaylight.org/gerrit/#/c/52883). The internal RemoteTransactionContext is not thread-safe as it is assumed clients will not access it concurrently and I believe this is mandated in the API contract. Even if it was thread-safe internally, other problems can occur if access isn't properly coordinated on the client side, e.g. doing a put after submitting a transaction.

In talking with Anil, the OF TransactionChainManager isn't properly synchronized. In fact, we had looked at it a while ago and concluded there were holes in it.

Comment by Robert Varga [ 07/Mar/17 ]

ofp patch: https://git.opendaylight.org/gerrit/52965

Comment by Luis Gomez [ 09/Mar/17 ]

Thanks Robert

Comment by Luis Gomez [ 09/Mar/17 ]

And Tom, I also see you here

Comment by Tom Pantelis [ 09/Mar/17 ]

Present and accounted for

(In reply to Luis Gomez from comment #6)
> And Tom, I also see you here

Generated at Wed Feb 07 20:33:35 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.