[BGPCEP-766] Deadlocked PCEP Created: 16/Mar/18  Updated: 18/Apr/18  Resolved: 21/Mar/18

Status: Verified
Project: bgpcep
Component/s: PCEP
Affects Version/s: None
Fix Version/s: Oxygen

Type: Bug Priority: Medium
Reporter: Claudio David Gasparini Assignee: Claudio David Gasparini
Resolution: Done Votes: 0
Labels: csit:exception
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/bgpcep-csit-1node-periodic-throughpcep-all-oxygen/215/odl1_karaf.log.gz

 

2018-03-16T00:47:59,642 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 355 - org.opendaylight.infrautils.metrics-impl - 1.3.0.SNAPSHOT | Oh nose - there are 2 deadlocked threads!! 2018-03-16T00:47:59,643 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 355 - org.opendaylight.infrautils.metrics-impl - 1.3.0.SNAPSHOT | Deadlocked thread stack trace: CommitFutures-4 locked on org.opendaylight.bgpcep.pcep.topology.provider.Stateful07TopologySessionListener@4c78910c (owned by epollEventLoopGroup-5-1): at org.opendaylight.bgpcep.pcep.topology.provider.AbstractTopologySessionListener$3.onSuccess(AbstractTopologySessionListener.java:288) at org.opendaylight.bgpcep.pcep.topology.provider.AbstractTopologySessionListener$3.onSuccess(AbstractTopologySessionListener.java:285) at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1377) at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:399) at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1016) at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:867) at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:691) at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:48) at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransaction.onSuccess(PingPongTransaction.java:60) at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.transactionSuccessful(PingPongTransactionChain.java:294) at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain$2.onSuccess(PingPongTransactionChain.java:246) at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain$2.onSuccess(PingPongTransactionChain.java:243) at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1377) at org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker$AsyncNotifyingSettableFuture$DelegatingRunnable.run(ConcurrentDOMDataBroker.java:320) at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:399) at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1016) at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:672) at org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker$AsyncNotifyingSettableFuture.addListener(ConcurrentDOMDataBroker.java:278) at com.google.common.util.concurrent.ForwardingListenableFuture.addListener(ForwardingListenableFuture.java:45) at com.google.common.util.concurrent.Futures.addCallback(Futures.java:1352) at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.processTransaction(PingPongTransactionChain.java:243) at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.processNextTransaction(PingPongTransactionChain.java:283) at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.transactionSuccessful(PingPongTransactionChain.java:295) at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain$2.onSuccess(PingPongTransactionChain.java:246) at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain$2.onSuccess(PingPongTransactionChain.java:243) at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1377) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2018-03-16T00:47:59,643 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 355 - org.opendaylight.infrautils.metrics-impl - 1.3.0.SNAPSHOT | Deadlocked thread stack trace: epollEventLoopGroup-5-1 locked on org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain@42bf66f3 (owned by CommitFutures-4): at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.slowAllocateTransaction(PingPongTransactionChain.java:169) at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.allocateTransaction(PingPongTransactionChain.java:195) at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.newReadWriteTransaction(PingPongTransactionChain.java:445) at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.newWriteOnlyTransaction(PingPongTransactionChain.java:487) at org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter.newWriteOnlyTransaction(BindingDOMTransactionChainAdapter.java:75) at org.opendaylight.bgpcep.pcep.topology.provider.AbstractTopologySessionListener.onMessage(AbstractTopologySessionListener.java:276) at org.opendaylight.protocol.pcep.impl.PCEPSessionImpl.handleMessage(PCEPSessionImpl.java:354) at org.opendaylight.protocol.pcep.impl.PCEPSessionImpl.channelRead0(PCEPSessionImpl.java:420) at org.opendaylight.protocol.pcep.impl.PCEPSessionImpl.channelRead0(PCEPSessionImpl.java:59) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935) at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:797) at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:404) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:304) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) at java.lang.Thread.run(Thread.java:748) 2018-03-16T00:47:59,870 | WARN | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 355 - org.opendaylight.infrautils.metrics-impl - 1.3.0.SNAPSHOT | Thread Dump: "main" id=1 state=WAITING - waiting on <0x7d65c2fd> (a java.util.concurrent.atomic.AtomicReference) - locked <0x7d65c2fd> (a java.util.concurrent.atomic.AtomicReference) at java.lang.Object.wait(Native Method) at org.eclipse.osgi.container.SystemModule.waitForStop(SystemModule.java:168) at org.eclipse.osgi.internal.framework.EquinoxBundle$SystemBundle.waitForStop(EquinoxBundle.java:250) at org.eclipse.osgi.launch.Equinox.waitForStop(Equinox.java:181) at org.apache.karaf.main.Main.awaitShutdown(Main.java:631) at org.apache.karaf.main.Main.main(Main.java:189) "Reference Handler" id=2 state=WAITING - waiting on <0x6ae9769d> (a java.lang.ref.Reference$Lock) - locked <0x6ae9769d> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at java.lang.ref.Reference.tryHandlePending(Reference.java:191) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153) "Finalizer" id=3 state=WAITING - waiting on <0x18accc92> (a java.lang.ref.ReferenceQueue$Lock) - locked <0x18accc92> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.Object.wait(Native Method) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209) "Signal Dispatcher" id=4 state=RUNNABLE "RMI TCP Accept-0" id=10 state=RUNNABLE (running in native) at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) at java.net.ServerSocket.implAccept(ServerSocket.java:545) at java.net.ServerSocket.accept(ServerSocket.java:513) at sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:52) at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:400) at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:372) at java.lang.Thread.run(Thread.java:748) "Active Thread: Equinox Container: 027c4d30-7926-451b-87ab-dcb50978668f" id=13 state=TIMED_WAITING - waiting on <0x3fc2d479> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) - locked <0x3fc2d479> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) "Framework Event Dispatcher: org.eclipse.osgi.internal.framework.EquinoxEventPublisher@4ae3c1cd" id=15 state=WAITING - waiting on <0x494bda1f> (a org.eclipse.osgi.framework.eventmgr.EventManager$EventThread) - locked <0x494bda1f> (a org.eclipse.osgi.framework.eventmgr.EventManager$EventThread) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.getNextEvent(EventManager.java:400) at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.run(EventManager.java:336) "Start Level: Equinox Container: 027c4d30-7926-451b-87ab-dcb50978668f" id=16 state=WAITING - waiting on <0x6b432e94> (a org.eclipse.osgi.framework.eventmgr.EventManager$EventThread) - locked <0x6b432e94> (a org.eclipse.osgi.framework.eventmgr.EventManager$EventThread) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.getNextEvent(EventManager.java:400) at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.run(EventManager.java:336) "Karaf Lock Monitor Thread" id=17 state=TIMED_WAITING at java.lang.Thread.sleep(Native Method) at org.apache.karaf.main.Main.doMonitor(Main.java:403) at org.apache.karaf.main.Main.access$300(Main.java:75) at org.apache.karaf.main.Main$3.run(Main.java:379) "Karaf Shutdown Socket Thread" id=18 state=RUNNABLE (running in native) at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) at java.net.ServerSocket.implAccept(ServerSocket.java:545) at java.net.ServerSocket.accept(ServerSocket.java:513) at org.apache.karaf.main.ShutdownSocketThread.run(ShutdownSocketThread.java:57) "CM Configuration Updater" id=20 state=WAITING - waiting on <0x5cbf6ed5> (a java.util.LinkedList) - locked <0x5cbf6ed5> (a java.util.LinkedList) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:83) at java.lang.Thread.run(Thread.java:748) "CM Event Dispatcher" id=21 state=WAITING - waiting on <0x007ec037> (a java.util.LinkedList) - locked <0x007ec037> (a java.util.LinkedList) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:83) at java.lang.Thread.run(Thread.java:748) "Thread-1" id=23 state=RUNNABLE (running in native) at sun.nio.fs.LinuxWatchService.poll(Native Method) at sun.nio.fs.LinuxWatchService.access$600(LinuxWatchService.java:47) at sun.nio.fs.LinuxWatchService$Poller.run(LinuxWatchService.java:314) at java.lang.Thread.run(Thread.java:748) "fileinstall-/tmp/karaf-0.8.0-SNAPSHOT/etc" id=22 state=TIMED_WAITING - waiting on <0x38b35330> (a org.apache.felix.fileinstall.internal.DirectoryWatcher) - locked <0x38b35330> (a org.apache.felix.fileinstall.internal.DirectoryWatcher) at java.lang.Object.wait(Native Method) at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:316) "Thread-2" id=26 state=RUNNABLE (running in native) at sun.nio.fs.LinuxWatchService.poll(Native Method) at sun.nio.fs.LinuxWatchService.access$600(LinuxWatchService.java:47) at sun.nio.fs.LinuxWatchService$Poller.run(LinuxWatchService.java:314) at java.lang.Thread.run(Thread.java:748) "fileinstall-/tmp/karaf-0.8.0-SNAPSHOT/deploy" id=25 state=TIMED_WAITING - waiting on <0x7e493056> (a org.apache.felix.fileinstall.internal.DirectoryWatcher) - locked <0x7e493056> (a org.apache.felix.fileinstall.internal.DirectoryWatcher) at java.lang.Object.wait(Native Method) at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:316) "features-1-thread-1" id=28 state=WAITING - waiting on <0x5bdcb876> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) - locked <0x5bdcb876> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) "Bundle File Closer" id=86 state=WAITING - waiting on <0x29068112> (a org.eclipse.osgi.framework.eventmgr.EventManager$EventThread) - locked <0x29068112> (a org.eclipse.osgi.framework.eventmgr.EventManager$EventThread) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.getNextEvent(EventManager.java:400) at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.run(EventManager.java:336) "Refresh Thread: Equinox Container: 027c4d30-7926-451b-87ab-dcb50978668f" id=87 state=WAITING - waiting on <0x59c475f3> (a org.eclipse.osgi.framework.eventmgr.EventManager$EventThread) - locked <0x59c475f3> (a org.eclipse.osgi.framework.eventmgr.EventManager$EventThread) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.getNextEvent(EventManager.java:400) at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.run(EventManager.java:336) "Blueprint Extender: 1" id=91 state=WAITING - waiting on <0x3582e6e8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) - locked <0x3582e6e8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) "Blueprint Extender: 2" id=92 state=TIMED_WAITING - waiting on <0x3582e6e8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) - locked <0x3582e6e8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) "Blueprint Extender: 3" id=93 state=WAITING - waiting on <0x3582e6e8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) - locked <0x3582e6e8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) "SCR Component Actor" id=96 state=WAITING - waiting on <0x79d8a5d5> (a java.util.LinkedList) - locked <0x79d8a5d5> (a java.util.LinkedList) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.felix.scr.impl.ComponentActorThread.run(ComponentActorThread.java:81) at java.lang.Thread.run(Thread.java:748) "Blueprint Event Dispatcher: 1" id=97 state=WAITING - waiting on <0x64893baa> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) - locked <0x64893baa> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) "Thread-45" id=101 state=RUNNABLE (running in native) at sun.nio.ch.EPoll.epollWait(Native Method) at sun.nio.ch.EPollPort$EventHandlerTask.poll(EPollPort.java:194) at sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:268) at java.lang.Thread.run(Thread.java:748) "sshd-SshServer[78b267b]-timer-thread-1" id=102 state=TIMED_WAITING - waiting on <0x2e4b57df> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) - locked <0x2e4b57df> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) "RMI TCP Accept-1099" id=104 state=RUNNABLE (running in native) at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) at java.net.ServerSocket.implAccept(ServerSocket.java:545) at java.net.ServerSocket.accept(ServerSocket.java:513) at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:400) at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:372) at java.lang.Thread.run(Thread.java:748) "RMI TCP Accept-44444" id=106 state=RUNNABLE (running in native) at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) at java.net.ServerSocket.implAccept(ServerSocket.java:545) at java.net.ServerSocket.accept(ServerSocket.java:513) at org.apache.karaf.management.ConnectorServerFactory$LocalOnlyServerSocket.accept(ConnectorServerFactory.java:412) at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:400) at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:372) at java.lang.Thread.run(Thread.java:748) "RMI Reaper" id=107 state=WAITING - waiting on <0x385b4342> (a java.lang.ref.ReferenceQueue$Lock) - locked <0x385b4342> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.Object.wait(Native Method) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164) at sun.rmi.transport.ObjectTable$Reaper.run(ObjectTable.java:351) at java.lang.Thread.run(Thread.java:748)



 Comments   
Comment by Kit Lou [ 20/Mar/18 ]

cdgasparini - Is this issue a regression (i.e. used to work in prior releases)?  What is the customer impact?  Is there a workaround that can be documented?  Can we ship with it, i.e. fix the problem in Oxygen-SR1?

Comment by Claudio David Gasparini [ 20/Mar/18 ]

Hi klou

this is a regression, issue is a deadlock which will make the feature unusable and affecting other depending features as well, since everything will freeze.

There is no workaround, and I marked as blocker because I consider it a mandatory fix. If the issue was other than a deadlock, I wouldn't have issue to move this to SR1, but this is a must IMHO.

 

Regards, 

Comment by Daniel Farrell [ 20/Mar/18 ]

cdgasparini - It looks like this same test was failing and never signed-off for Oxygen RC1 and RC2. We should have raised this as a blocker then and had it fixed well before now. Since our drop-dead release date is tomorrow, and there isn't a patch ready-to-merge immediately, I strongly recommend we do not block the release for this. If you disagree you can take it to the TSC. 

Comment by Claudio David Gasparini [ 20/Mar/18 ]

Hi dfarrell07

test can be failing, but not with this issue, that is why I raised the blocker, and the fix is ready to be merged .So I dont see the issue on take the fix.

On the opposite side BGP and PCEP features will be locked and not usable if hitting this bug. So I stand with this bug as a blocker.

 

Regards, 

Comment by Daniel Farrell [ 20/Mar/18 ]

> So I dont see the issue on take the fix.

We would have to respin autorelease, hope we get lucky again avoid all sporadic failures (unlikely) and then ask all projects to start over with reviewing the RC, submitting their Final Milestones and signing off on test failures.

> fix is ready to be merged

Please link the fix to the Jira.

> but not with this issue

So these are the new failures:

https://jenkins.opendaylight.org/releng/job/bgpcep-csit-1node-periodic-throughpcep-all-oxygen/190/robot/bgpcep-throughpcep.txt/Cases/Verify_5/

https://jenkins.opendaylight.org/releng/job/bgpcep-csit-1node-periodic-throughpcep-all-oxygen/190/robot/bgpcep-throughpcep.txt/Cases/Updater_5/

It does look like they have been failing quite a bit recently. I also see the same jobs failing in RC1.

https://jenkins.opendaylight.org/releng/job/bgpcep-csit-1node-periodic-throughpcep-all-oxygen/180/robot/

Comment by Daniel Farrell [ 20/Mar/18 ]

I found the change: https://git.opendaylight.org/gerrit/#/c/69659/

Comment by Kit Lou [ 20/Mar/18 ]

cdgasparini - Please provide the gerrit patch info.  We should have it merged and re-spin as soon as possible.  TSC can decide whether to block the existing RC3 for it.

Comment by Claudio David Gasparini [ 20/Mar/18 ]

Yes, I know the pain of the respin. I don't argue that.

The issue resides on the logs, if we do visual check on failing test it will look the same, but checking the karaf logs

you will see -> org.opendaylight.infrautils.metrics-impl - 1.3.0 | Deadlocked

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/bgpcep-csit-1node-periodic-throughpcep-all-oxygen/214/odl1_karaf.log.gz

vs RC1

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/bgpcep-csit-1node-periodic-throughpcep-all-oxygen/180/odl1_karaf.log.gz

fix: https://git.opendaylight.org/gerrit/#/c/69659/

Comment by Daniel Farrell [ 20/Mar/18 ]

As discussed on IRC and mailing list, we're going to go ahead and merge 69659, kick a new autorelease build (hoping we get lucky avoiding sporadic failures) and let the TSC vote if they want to re-do RC3 in the meantime.

Comment by Daniel Farrell [ 20/Mar/18 ]

Discussion ongoing here: https://lists.opendaylight.org/pipermail/tsc/2018-March/009160.html

Comment by Claudio David Gasparini [ 20/Mar/18 ]

Thanks for your understanding, I hope we don't hit any failure.

Regards, 

Generated at Wed Feb 07 19:14:03 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.