[OPNFLWPLUG-1090] ConcurrentModificationException: null when connecting cbench switches Created: 05/May/20 Updated: 06/Nov/20 Resolved: 28/May/20 |
|
| Status: | Resolved |
| Project: | OpenFlowPlugin |
| Component/s: | statistics-manager |
| Affects Version/s: | None |
| Fix Version/s: | Magnesium-SR1, Aluminium |
| Type: | Bug | Priority: | Highest |
| Reporter: | Jamo Luhrsen | Assignee: | Gobinath Suganthan |
| Resolution: | Done | Votes: | 0 |
| Labels: | CSIT | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
the cbench csit job is frequently failing because some switches are getting their initial karaf.log shows this: 2020-04-30T22:28:37,238 | WARN | epollEventLoopGroup-9-4 | HandshakeListenerImpl | 310 - org.opendaylight.openflowplugin.impl - 0.10.1 | initial processing failed for device openflow:3 java.util.ConcurrentModificationException: null at java.util.HashMap.computeIfAbsent(HashMap.java:1134) ~[?:?] at org.opendaylight.openflowplugin.impl.statistics.ofpspecific.SessionStatistics.getConnectionEvents(SessionStatistics.java:36) ~[310:org.opendaylight.openflowplugin.impl:0.10.1] at org.opendaylight.openflowplugin.impl.statistics.ofpspecific.SessionStatistics.countEvent(SessionStatistics.java:25) ~[310:org.opendaylight.openflowplugin.impl:0.10.1] at org.opendaylight.openflowplugin.impl.connection.listener.HandshakeListenerImpl$1.onSuccess(HandshakeListenerImpl.java:81) [310:org.opendaylight.openflowplugin.impl:0.10.1] at org.opendaylight.openflowplugin.impl.connection.listener.HandshakeListenerImpl$1.onSuccess(HandshakeListenerImpl.java:68) [310:org.opendaylight.openflowplugin.impl:0.10.1] at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1076) [27:com.google.guava:27.1.0.jre] at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) [27:com.google.guava:27.1.0.jre] at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) [27:com.google.guava:27.1.0.jre] at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) [27:com.google.guava:27.1.0.jre] at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:726) [27:com.google.guava:27.1.0.jre] at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:48) [27:com.google.guava:27.1.0.jre] at org.opendaylight.openflowjava.protocol.impl.core.connection.AbstractRpcListener.successfulRpc(AbstractRpcListener.java:91) [317:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.10.1] at org.opendaylight.openflowjava.protocol.impl.core.connection.ResponseExpectedRpcListener.completed(ResponseExpectedRpcListener.java:37) [317:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.10.1] at org.opendaylight.openflowjava.protocol.impl.core.connection.ConnectionAdapterImpl.consumeDeviceMessage(ConnectionAdapterImpl.java:160) [317:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.10.1] at org.opendaylight.openflowjava.protocol.impl.core.connection.AbstractConnectionAdapterStatistics.consume(AbstractConnectionAdapterStatistics.java:68) [317:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.10.1] at org.opendaylight.openflowjava.protocol.impl.core.connection.ConnectionAdapterImpl.consume(ConnectionAdapterImpl.java:48) [317:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.10.1] at org.opendaylight.openflowjava.protocol.impl.core.DelegatingInboundHandler.channelRead(DelegatingInboundHandler.java:44) [317:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.10.1] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [57:io.netty.transport:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [57:io.netty.transport:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [57:io.netty.transport:4.1.48.Final] at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) [55:io.netty.handler:4.1.48.Final] at org.opendaylight.openflowjava.protocol.impl.core.IdleHandler.channelRead(IdleHandler.java:37) [317:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.10.1] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [57:io.netty.transport:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [57:io.netty.transport:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [57:io.netty.transport:4.1.48.Final] at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) [52:io.netty.codec:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [57:io.netty.transport:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [57:io.netty.transport:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [57:io.netty.transport:4.1.48.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321) [52:io.netty.codec:4.1.48.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:295) [52:io.netty.codec:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [57:io.netty.transport:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [57:io.netty.transport:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [57:io.netty.transport:4.1.48.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321) [52:io.netty.codec:4.1.48.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:295) [52:io.netty.codec:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [57:io.netty.transport:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [57:io.netty.transport:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [57:io.netty.transport:4.1.48.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [57:io.netty.transport:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [57:io.netty.transport:4.1.48.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [57:io.netty.transport:4.1.48.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [57:io.netty.transport:4.1.48.Final] at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792) [58:io.netty.transport-native-epoll:4.1.48.Final] at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475) [58:io.netty.transport-native-epoll:4.1.48.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) [58:io.netty.transport-native-epoll:4.1.48.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [54:io.netty.common:4.1.48.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [54:io.netty.common:4.1.48.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [54:io.netty.common:4.1.48.Final] at java.lang.Thread.run(Thread.java:834) [?:?] 2 that snippet came from this job Also, in the same log, the below exception is seen. If it's unrelated we may need to open 2020-04-30T22:28:37,377 | WARN | opendaylight-cluster-data-akka.actor.default-dispatcher-35 | ClusterSingletonServiceGroupImpl | 275 - org.opendaylight.mdsal.singleton-dom-impl - 5.0.10 | Service group openflow:10 service org.opendaylight.openflowplugin.impl.lifecycle.ContextChainImpl@5b477b89 failed to stop, attempting to continue java.util.ConcurrentModificationException: null at java.util.AbstractList$RandomAccessSpliterator.get(AbstractList.java:737) ~[?:?] at java.util.AbstractList$RandomAccessSpliterator.forEachRemaining(AbstractList.java:720) ~[?:?] at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?] at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?] at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) ~[?:?] at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?] at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?] at org.opendaylight.openflowplugin.impl.lifecycle.ContextChainImpl.closeServiceInstance(ContextChainImpl.java:91) ~[?:?] at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.ensureStopping(ClusterSingletonServiceGroupImpl.java:675) ~[?:?] at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.ensureServicesStopping(ClusterSingletonServiceGroupImpl.java:657) ~[?:?] at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.tryReconcileState(ClusterSingletonServiceGroupImpl.java:571) ~[?:?] at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.reconcileState(ClusterSingletonServiceGroupImpl.java:460) ~[?:?] at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.ownershipChanged(ClusterSingletonServiceGroupImpl.java:341) ~[?:?] at org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl.ownershipChanged(AbstractClusterSingletonServiceProviderImpl.java:236) ~[?:?] at org.opendaylight.mdsal.singleton.dom.impl.DOMClusterSingletonServiceProviderImpl.ownershipChanged(DOMClusterSingletonServiceProviderImpl.java:22) ~[?:?] at org.opendaylight.controller.cluster.entityownership.EntityOwnershipListenerActor.onEntityOwnershipChanged(EntityOwnershipListenerActor.java:45) ~[223:org.opendaylight.controller.sal-distributed-eos:1.11.1] at org.opendaylight.controller.cluster.entityownership.EntityOwnershipListenerActor.handleReceive(EntityOwnershipListenerActor.java:34) ~[223:org.opendaylight.controller.sal-distributed-eos:1.11.1] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) [33:com.typesafe.akka.actor:2.5.31] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) [33:com.typesafe.akka.actor:2.5.31] at scala.PartialFunction.applyOrElse(PartialFunction.scala:127) [382:org.scala-lang.scala-library:2.12.11.v20200311-100536-VFINAL-cd8410d] at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) [382:org.scala-lang.scala-library:2.12.11.v20200311-100536-VFINAL-cd8410d] at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20) [33:com.typesafe.akka.actor:2.5.31] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) [382:org.scala-lang.scala-library:2.12.11.v20200311-100536-VFINAL-cd8410d] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) [382:org.scala-lang.scala-library:2.12.11.v20200311-100536-VFINAL-cd8410d] at akka.actor.Actor.aroundReceive(Actor.scala:539) [33:com.typesafe.akka.actor:2.5.31] at akka.actor.Actor.aroundReceive$(Actor.scala:537) [33:com.typesafe.akka.actor:2.5.31] at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:227) [33:com.typesafe.akka.actor:2.5.31] at akka.actor.ActorCell.receiveMessage(ActorCell.scala:614) [33:com.typesafe.akka.actor:2.5.31] at akka.actor.ActorCell.invoke(ActorCell.scala:583) [33:com.typesafe.akka.actor:2.5.31] at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:268) [33:com.typesafe.akka.actor:2.5.31] at akka.dispatch.Mailbox.run(Mailbox.scala:229) [33:com.typesafe.akka.actor:2.5.31] at akka.dispatch.Mailbox.exec(Mailbox.scala:241) [33:com.typesafe.akka.actor:2.5.31] at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [33:com.typesafe.akka.actor:2.5.31] at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [33:com.typesafe.akka.actor:2.5.31] at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [33:com.typesafe.akka.actor:2.5.31] at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [33:com.typesafe.akka.actor:2.5.31] 2 |
| Comments |
| Comment by Gobinath Suganthan [ 06/May/20 ] |
|
jluhrsen I had earlier used "test-openflowplugin-cbench" for running cbench tests for debugging a different cbench failure. Can it be used to reproduce this issue too? |
| Comment by Jamo Luhrsen [ 06/May/20 ] |
|
yes, and looks like you already did it successfully with the cbench test passing. I'll try to run it a few more times, but good The bigger question here is if this issue is something we probably want to get in to Magnesium SR1 or if you know it will |
| Comment by Gobinath Suganthan [ 07/May/20 ] |
|
jluhrsen This error had resulted from the race condition caused because of many switch handshakes happening at the same time. How many switches are involved in cbench tests and how fast are they flapped? I think this could occur in production too when there are many switch handshakes (large scale). |
| Comment by Jamo Luhrsen [ 07/May/20 ] |
|
This is only 16 switches and the problem is on initial connection. Does not seem like an unreasonable state I guess. If your |
| Comment by Gobinath Suganthan [ 07/May/20 ] |
|
jluhrsen I have updated my patch and added reviewers from OFPlugin. We can merge if one of them reviews and gives +2. |