[OPNFLWPLUG-1090] ConcurrentModificationException: null when connecting cbench switches Created: 05/May/20  Updated: 06/Nov/20  Resolved: 28/May/20

Status: Resolved
Project: OpenFlowPlugin
Component/s: statistics-manager
Affects Version/s: None
Fix Version/s: Magnesium-SR1, Aluminium

Type: Bug Priority: Highest
Reporter: Jamo Luhrsen Assignee: Gobinath Suganthan
Resolution: Done Votes: 0
Labels: CSIT
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

the cbench csit job is frequently failing because some switches are getting their initial
connection rejected.

karaf.log shows this:

2020-04-30T22:28:37,238 | WARN  | epollEventLoopGroup-9-4 | HandshakeListenerImpl            | 310 - org.opendaylight.openflowplugin.impl - 0.10.1 | initial processing failed for device openflow:3
java.util.ConcurrentModificationException: null
	at java.util.HashMap.computeIfAbsent(HashMap.java:1134) ~[?:?]
	at org.opendaylight.openflowplugin.impl.statistics.ofpspecific.SessionStatistics.getConnectionEvents(SessionStatistics.java:36) ~[310:org.opendaylight.openflowplugin.impl:0.10.1]
	at org.opendaylight.openflowplugin.impl.statistics.ofpspecific.SessionStatistics.countEvent(SessionStatistics.java:25) ~[310:org.opendaylight.openflowplugin.impl:0.10.1]
	at org.opendaylight.openflowplugin.impl.connection.listener.HandshakeListenerImpl$1.onSuccess(HandshakeListenerImpl.java:81) [310:org.opendaylight.openflowplugin.impl:0.10.1]
	at org.opendaylight.openflowplugin.impl.connection.listener.HandshakeListenerImpl$1.onSuccess(HandshakeListenerImpl.java:68) [310:org.opendaylight.openflowplugin.impl:0.10.1]
	at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1076) [27:com.google.guava:27.1.0.jre]
	at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) [27:com.google.guava:27.1.0.jre]
	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) [27:com.google.guava:27.1.0.jre]
	at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) [27:com.google.guava:27.1.0.jre]
	at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:726) [27:com.google.guava:27.1.0.jre]
	at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:48) [27:com.google.guava:27.1.0.jre]
	at org.opendaylight.openflowjava.protocol.impl.core.connection.AbstractRpcListener.successfulRpc(AbstractRpcListener.java:91) [317:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.10.1]
	at org.opendaylight.openflowjava.protocol.impl.core.connection.ResponseExpectedRpcListener.completed(ResponseExpectedRpcListener.java:37) [317:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.10.1]
	at org.opendaylight.openflowjava.protocol.impl.core.connection.ConnectionAdapterImpl.consumeDeviceMessage(ConnectionAdapterImpl.java:160) [317:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.10.1]
	at org.opendaylight.openflowjava.protocol.impl.core.connection.AbstractConnectionAdapterStatistics.consume(AbstractConnectionAdapterStatistics.java:68) [317:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.10.1]
	at org.opendaylight.openflowjava.protocol.impl.core.connection.ConnectionAdapterImpl.consume(ConnectionAdapterImpl.java:48) [317:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.10.1]
	at org.opendaylight.openflowjava.protocol.impl.core.DelegatingInboundHandler.channelRead(DelegatingInboundHandler.java:44) [317:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.10.1]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [57:io.netty.transport:4.1.48.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [57:io.netty.transport:4.1.48.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [57:io.netty.transport:4.1.48.Final]
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) [55:io.netty.handler:4.1.48.Final]
	at org.opendaylight.openflowjava.protocol.impl.core.IdleHandler.channelRead(IdleHandler.java:37) [317:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.10.1]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [57:io.netty.transport:4.1.48.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [57:io.netty.transport:4.1.48.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [57:io.netty.transport:4.1.48.Final]
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) [52:io.netty.codec:4.1.48.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [57:io.netty.transport:4.1.48.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [57:io.netty.transport:4.1.48.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [57:io.netty.transport:4.1.48.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321) [52:io.netty.codec:4.1.48.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:295) [52:io.netty.codec:4.1.48.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [57:io.netty.transport:4.1.48.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [57:io.netty.transport:4.1.48.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [57:io.netty.transport:4.1.48.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321) [52:io.netty.codec:4.1.48.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:295) [52:io.netty.codec:4.1.48.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [57:io.netty.transport:4.1.48.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [57:io.netty.transport:4.1.48.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [57:io.netty.transport:4.1.48.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [57:io.netty.transport:4.1.48.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [57:io.netty.transport:4.1.48.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [57:io.netty.transport:4.1.48.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [57:io.netty.transport:4.1.48.Final]
	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792) [58:io.netty.transport-native-epoll:4.1.48.Final]
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475) [58:io.netty.transport-native-epoll:4.1.48.Final]
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) [58:io.netty.transport-native-epoll:4.1.48.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [54:io.netty.common:4.1.48.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [54:io.netty.common:4.1.48.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [54:io.netty.common:4.1.48.Final]
	at java.lang.Thread.run(Thread.java:834) [?:?]
2

that snippet came from this job

Also, in the same log, the below exception is seen. If it's unrelated we may need to open
a new bug. It's definitely coming from a different code path.

2020-04-30T22:28:37,377 | WARN  | opendaylight-cluster-data-akka.actor.default-dispatcher-35 | ClusterSingletonServiceGroupImpl | 275 - org.opendaylight.mdsal.singleton-dom-impl - 5.0.10 | Service group openflow:10 service org.opendaylight.openflowplugin.impl.lifecycle.ContextChainImpl@5b477b89 failed to stop, attempting to continue
java.util.ConcurrentModificationException: null
	at java.util.AbstractList$RandomAccessSpliterator.get(AbstractList.java:737) ~[?:?]
	at java.util.AbstractList$RandomAccessSpliterator.forEachRemaining(AbstractList.java:720) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?]
	at org.opendaylight.openflowplugin.impl.lifecycle.ContextChainImpl.closeServiceInstance(ContextChainImpl.java:91) ~[?:?]
	at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.ensureStopping(ClusterSingletonServiceGroupImpl.java:675) ~[?:?]
	at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.ensureServicesStopping(ClusterSingletonServiceGroupImpl.java:657) ~[?:?]
	at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.tryReconcileState(ClusterSingletonServiceGroupImpl.java:571) ~[?:?]
	at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.reconcileState(ClusterSingletonServiceGroupImpl.java:460) ~[?:?]
	at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.ownershipChanged(ClusterSingletonServiceGroupImpl.java:341) ~[?:?]
	at org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl.ownershipChanged(AbstractClusterSingletonServiceProviderImpl.java:236) ~[?:?]
	at org.opendaylight.mdsal.singleton.dom.impl.DOMClusterSingletonServiceProviderImpl.ownershipChanged(DOMClusterSingletonServiceProviderImpl.java:22) ~[?:?]
	at org.opendaylight.controller.cluster.entityownership.EntityOwnershipListenerActor.onEntityOwnershipChanged(EntityOwnershipListenerActor.java:45) ~[223:org.opendaylight.controller.sal-distributed-eos:1.11.1]
	at org.opendaylight.controller.cluster.entityownership.EntityOwnershipListenerActor.handleReceive(EntityOwnershipListenerActor.java:34) ~[223:org.opendaylight.controller.sal-distributed-eos:1.11.1]
	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) [33:com.typesafe.akka.actor:2.5.31]
	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) [33:com.typesafe.akka.actor:2.5.31]
	at scala.PartialFunction.applyOrElse(PartialFunction.scala:127) [382:org.scala-lang.scala-library:2.12.11.v20200311-100536-VFINAL-cd8410d]
	at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) [382:org.scala-lang.scala-library:2.12.11.v20200311-100536-VFINAL-cd8410d]
	at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20) [33:com.typesafe.akka.actor:2.5.31]
	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) [382:org.scala-lang.scala-library:2.12.11.v20200311-100536-VFINAL-cd8410d]
	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) [382:org.scala-lang.scala-library:2.12.11.v20200311-100536-VFINAL-cd8410d]
	at akka.actor.Actor.aroundReceive(Actor.scala:539) [33:com.typesafe.akka.actor:2.5.31]
	at akka.actor.Actor.aroundReceive$(Actor.scala:537) [33:com.typesafe.akka.actor:2.5.31]
	at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:227) [33:com.typesafe.akka.actor:2.5.31]
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:614) [33:com.typesafe.akka.actor:2.5.31]
	at akka.actor.ActorCell.invoke(ActorCell.scala:583) [33:com.typesafe.akka.actor:2.5.31]
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:268) [33:com.typesafe.akka.actor:2.5.31]
	at akka.dispatch.Mailbox.run(Mailbox.scala:229) [33:com.typesafe.akka.actor:2.5.31]
	at akka.dispatch.Mailbox.exec(Mailbox.scala:241) [33:com.typesafe.akka.actor:2.5.31]
	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [33:com.typesafe.akka.actor:2.5.31]
	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [33:com.typesafe.akka.actor:2.5.31]
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [33:com.typesafe.akka.actor:2.5.31]
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [33:com.typesafe.akka.actor:2.5.31]
2


 Comments   
Comment by Gobinath Suganthan [ 06/May/20 ]

jluhrsen I had earlier used "test-openflowplugin-cbench" for running cbench tests for debugging a different cbench failure. Can it be used to reproduce this issue too?

Comment by Jamo Luhrsen [ 06/May/20 ]

yes, and looks like you already did it successfully with the cbench test passing. I'll try to run it a few more times, but good
news so far.

The bigger question here is if this issue is something we probably want to get in to Magnesium SR1 or if you know it will
not really be a problem for end-users. Is it possible for production environment switches to run in to this?

Comment by Gobinath Suganthan [ 07/May/20 ]

jluhrsen This error had resulted from the race condition caused because of many switch handshakes happening at the same time. How many switches are involved in cbench tests and how fast are they flapped?

I think this could occur in production too when there are many switch handshakes (large scale).

Comment by Jamo Luhrsen [ 07/May/20 ]

This is only 16 switches and the problem is on initial connection. Does not seem like an unreasonable state I guess. If your
fix is ready to go, we can try to get it merged asap and see about a respin for the Magnesium SR1 release.

Comment by Gobinath Suganthan [ 07/May/20 ]

jluhrsen I have updated my patch and added reviewers from OFPlugin. We can merge if one of them reviews and gives +2.

Generated at Wed Feb 07 20:34:10 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.