[OPNFLWPLUG-690] He plugin: Statistics collector thread may be blocked forever. Created: 19/May/16  Updated: 27/Sep/21  Resolved: 08/Jul/16

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Shigeru Yasuda Assignee: Anil Vishnoi
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 5916

 Description   

Version:
master (49cc69e154249c8cd0942905fa321e6b4e0fb46d)

Steps to reproduce:
1. Run controller and openflowplugin-he.
2. Start mininet, and then stop it.
3. Repeat step 2 again and again.

Step 2 may record the following log.

2016-05-19 19:57:42,171 | WARN | Pool-8-worker-11 | StatRpcMsgManagerImpl | 179 - org.opendaylight.openflowplugin.applications.statistics-manager - 0.3.0.SNAPSHOT | Response Registration for Statistics RPC call fail!
org.opendaylight.controller.md.sal.dom.api.DOMRpcImplementationNotAvailableException: No local or remote implementation available for rpc AbsoluteSchemaPath

{path=[(urn:opendaylight:group:statistics?revision=2013-11-11)get-all-group-statistics]}

at org.opendaylight.controller.remote.rpc.RemoteRpcImplementation$1.onComplete(RemoteRpcImplementation.java:66)[161:org.opendaylight.controller.sal-remoterpc-connector:1.4.0.SNAPSHOT]
at org.opendaylight.controller.remote.rpc.RemoteRpcImplementation$1.onComplete(RemoteRpcImplementation.java:56)[161:org.opendaylight.controller.sal-remoterpc-connector:1.4.0.SNAPSHOT]
at akka.dispatch.OnComplete.internal(Future.scala:259)[146:com.typesafe.akka.actor:2.4.4]
at akka.dispatch.OnComplete.internal(Future.scala:256)[146:com.typesafe.akka.actor:2.4.4]
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)[146:com.typesafe.akka.actor:2.4.4]
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)[146:com.typesafe.akka.actor:2.4.4]
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)[142:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)[142:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[142:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[142:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[142:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[142:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]

If that warning is logged, statistics collector thread is being blocked and no one wakes it up.

"odl-stat-collector-1-thread-0" #936 prio=5 os_prio=0 tid=0x00007f68ac054000 nid=0x5914 waiting on condition [0x00007f67dbb4f000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)

  • parking to wait for <0x000000008bf40fd8> (a com.google.common.util.concurrent.AbstractFuture$Sync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
    at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
    at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
    at org.opendaylight.openflowplugin.applications.statistics.manager.impl.StatPermCollectorImpl.collectStatCrossNetwork(StatPermCollectorImpl.java:327)
    at org.opendaylight.openflowplugin.applications.statistics.manager.impl.StatPermCollectorImpl.run(StatPermCollectorImpl.java:248)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)


 Comments   
Comment by Shigeru Yasuda [ 19/May/16 ]

The root cause is StatRpcMsgManagerImpl.registrationRpcFutureCallBack().

The callback added by registrationRpcFutureCallBack() sets the transaction ID to resultTransId if it sends a MULTIPART_REQUEST successfully.
But on failure, it doesn't set any value to resultTransId. So statistics collector thread never returns from resultTransId.get().

The callback needs to wake up statistics collector thread in any case.
If RPC fails, an exception that indicates the cause needs to be set to resultTransId.

Comment by Shigeru Yasuda [ 19/May/16 ]

https://git.opendaylight.org/gerrit/39102 (master)

Comment by Shuva Jyoti Kar [ 11/Jun/16 ]

changes look good !

cherry-picked the same to stable/Be:
https://git.opendaylight.org/gerrit/#/c/40194/

Comment by Shuva Jyoti Kar [ 11/Jun/16 ]

oops pasted the gerrit at the wrong place.
changes do look good

Comment by Shigeru Yasuda [ 24/Jun/16 ]

https://git.opendaylight.org/gerrit/40788 (stable/beryllium)

Generated at Wed Feb 07 20:33:08 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.