[BGPCEP-665] InstructionDeployerImplTest hangs periodically Created: 28/Apr/17  Updated: 03/Mar/19  Resolved: 09/Aug/17

Status: Resolved
Project: bgpcep
Component/s: General
Affects Version/s: Bugzilla Migration
Fix Version/s: Bugzilla Migration

Type: Bug
Reporter: Colin Dixon Assignee: Robert Varga
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File dump.txt     File org.opendaylight.bgpcep.programming.impl.InstructionDeployerImplTest-output.txt.gz    
Issue Links:
Blocks
is blocked by CONTROLLER-1629 AbstractConcurrentDataBrokerTest caus... Resolved
External issue ID: 8335

 Description   

In autorelease-carbon job 266 and 275, this test hung for multiple hours (in 275 for at least 13 hours). We need to either fix this test or flag it with @Ignore to keep making progress.



 Comments   
Comment by Colin Dixon [ 28/Apr/17 ]

Also see here:
https://lists.opendaylight.org/pipermail/release/2017-April/010465.html

Comment by Colin Dixon [ 28/Apr/17 ]

Removes the test in stable/carbon:
https://git.opendaylight.org/gerrit/#/c/56282/

I'm completely open to other solutions.

Comment by Colin Dixon [ 01/May/17 ]

An alternate fix would be to exclude the test from autorelease-carbon:
https://git.opendaylight.org/gerrit/#/c/56329/

Comment by Colin Dixon [ 01/May/17 ]

Attachment org.opendaylight.bgpcep.programming.impl.InstructionDeployerImplTest-output.txt.gz has been added with description: surefire output for the test in autorelease-carbon job 285

Comment by Colin Dixon [ 01/May/17 ]

I merged the autorelease version. We'll see if it works.

Comment by Colin Dixon [ 01/May/17 ]

Anil Belur pointed out in the comments of the autorelease patch that it might not work because maven plugin configuration might not pass through when the build starts at the root autorelease pom.xml and then moves into <modules> which it isn't the parent of.

I guess we'll likely find out in ~2 hours when either we'll be hung on the BGP test or not.

Comment by Colin Dixon [ 02/May/17 ]

As Anil Belur suspected, the patch to the autorelease Pom file did not actually suppress the test from running as you can see here:
https://jenkins.opendaylight.org/releng/job/autorelease-release-carbon/286/

I guess we either need to find a way to put that config in the appropriate parent Pom (maybe as part of the autorelease process) or, more easily, get bgpcep to fix/disable the test.

Comment by Anil Belur [ 02/May/17 ]

As per Thanh's recommendation:

I don't think patch 56329 will help. Maven uses inheritance to determine what plugin configuration to use. pom.xml inside of autorelease is not a parent pom of any projects which means the configuration stored there won't get read in by the module in bgpcep.

We either have to disable it at the module level or at odlparent which is the top level parent pom of all projects.

Therefore, I am reverting the original patch [3.] with [1.] and retrigger autorelease carbon, since we also have the issue resolved with the project test in [2.]

[1.] https://git.opendaylight.org/gerrit/56360
[2.] https://git.opendaylight.org/gerrit/#/c/56282/
[3.] https://git.opendaylight.org/gerrit/#/c/56329/

Comment by Colin Dixon [ 02/May/17 ]

For what it's worth, this also fails periodically in the project's Merge Job:
https://jenkins.opendaylight.org/releng/view/Merge-Carbon/job/bgpcep-merge-carbon/lastFailedBuild/

Shows that build #190 of of bgpcep-merge-carbon hung on this test.

Comment by Colin Dixon [ 02/May/17 ]

https://jenkins.opendaylight.org/releng/view/Merge-Carbon/job/bgpcep-merge-carbon/190/

Comment by Colin Dixon [ 02/May/17 ]

Downgrading to critical as the test is now @Ignored by this patch:
https://git.opendaylight.org/gerrit/#/c/56282/

Comment by Claudio David Gasparini [ 06/May/17 ]

dead lock is caused by AbstractConcurrentDataBrokerTest.
Debugging I see that it hangs when AbstractRegistrationTree#removeRegistration
is trying to take the log.

Created a demo class and test to easily reproduce the issue.
https://git.opendaylight.org/gerrit/#/c/56640/

Comment by Robert Varga [ 05/Aug/17 ]

Attachment dump.txt has been added with description: Thread dump from the hang

Comment by Robert Varga [ 05/Aug/17 ]

Democlass locking is a problem here:

"main" #1 prio=5 os_prio=0 tid=0x00007fe24000b000 nid=0x279e waiting on condition [0x00007fe248cb5000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)

  • parking to wait for <0x00000000d9173978> (a java.util.concurrent.locks.StampedLock)
    at java.util.concurrent.locks.StampedLock.acquireWrite(StampedLock.java:1119)
    at java.util.concurrent.locks.StampedLock.writeLock(StampedLock.java:354)
    at java.util.concurrent.locks.StampedLock$WriteLockView.lock(StampedLock.java:899)
    at org.opendaylight.mdsal.dom.spi.AbstractRegistrationTree.removeRegistration(AbstractRegistrationTree.java:82)
    at org.opendaylight.controller.sal.core.spi.data.AbstractDOMStoreTreeChangePublisher.access$000(AbstractDOMStoreTreeChangePublisher.java:29)
    at org.opendaylight.controller.sal.core.spi.data.AbstractDOMStoreTreeChangePublisher$1.removeRegistration(AbstractDOMStoreTreeChangePublisher.java:78)
    at org.opendaylight.yangtools.concepts.AbstractRegistration.close(AbstractRegistration.java:43)
    at org.opendaylight.controller.md.sal.binding.impl.BindingDataTreeChangeListenerRegistration.removeRegistration(BindingDataTreeChangeListenerRegistration.java:26)
    at org.opendaylight.yangtools.concepts.AbstractRegistration.close(AbstractRegistration.java:43)
    at org.opendaylight.bgpcep.programming.impl.DemoClass.close(DemoClass.java:61)
  • locked <0x00000000dc795ea0> (a org.opendaylight.bgpcep.programming.impl.DemoClass)
    at org.opendaylight.bgpcep.programming.impl.DemoBugTest.testInstructionDeployer(DemoBugTest.java:20)

(blocked to make modification to listeners)

"pool-2-thread-1" #14 prio=5 os_prio=0 tid=0x00007fe241006800 nid=0x27c6 waiting for monitor entry [0x00007fe21d036000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.opendaylight.bgpcep.programming.impl.DemoClass.onDataTreeChanged(DemoClass.java)

  • waiting to lock <0x00000000dc795ea0> (a org.opendaylight.bgpcep.programming.impl.DemoClass)
    at org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataTreeChangeListenerAdapter.onDataTreeChanged(BindingDOMDataTreeChangeListenerAdapter.java:41)
    at org.opendaylight.controller.md.sal.dom.store.impl.InMemoryDOMStoreTreeChangePublisher.lambda$static$0(InMemoryDOMStoreTreeChangePublisher.java:36)
    at org.opendaylight.controller.md.sal.dom.store.impl.InMemoryDOMStoreTreeChangePublisher$$Lambda$60/402847244.invokeListener(Unknown Source)
    at org.opendaylight.yangtools.util.concurrent.QueuedNotificationManager.lambda$null$0(QueuedNotificationManager.java:126)
    at org.opendaylight.yangtools.util.concurrent.QueuedNotificationManager$$Lambda$74/1248860520.accept(Unknown Source)
    at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:397)
    at org.opendaylight.yangtools.util.concurrent.QueuedNotificationManager.lambda$new$1(QueuedNotificationManager.java:124)
    at org.opendaylight.yangtools.util.concurrent.QueuedNotificationManager$$Lambda$59/60254480.invokeListener(Unknown Source)
    at org.opendaylight.yangtools.util.concurrent.QueuedNotificationManager$NotificationTask.invokeListener(QueuedNotificationManager.java:434)
    at org.opendaylight.yangtools.util.concurrent.QueuedNotificationManager$NotificationTask.run(QueuedNotificationManager.java:421)
    at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:260)
    at org.opendaylight.yangtools.util.concurrent.QueuedNotificationManager.runTask(QueuedNotificationManager.java:253)
    at org.opendaylight.yangtools.util.concurrent.QueuedNotificationManager.submitNotifications(QueuedNotificationManager.java:203)
    at org.opendaylight.yangtools.util.concurrent.QueuedNotificationManager.submitNotification(QueuedNotificationManager.java:168)
    at org.opendaylight.controller.md.sal.dom.store.impl.InMemoryDOMStoreTreeChangePublisher.notifyListeners(InMemoryDOMStoreTreeChangePublisher.java:52)
    at org.opendaylight.controller.sal.core.spi.data.AbstractDOMStoreTreeChangePublisher.notifyNode(AbstractDOMStoreTreeChangePublisher.java:116)
    at org.opendaylight.controller.sal.core.spi.data.AbstractDOMStoreTreeChangePublisher.notifyNode(AbstractDOMStoreTreeChangePublisher.java:123)
    at org.opendaylight.controller.sal.core.spi.data.AbstractDOMStoreTreeChangePublisher.lookupAndNotify(AbstractDOMStoreTreeChangePublisher.java:104)
    at org.opendaylight.controller.sal.core.spi.data.AbstractDOMStoreTreeChangePublisher.processCandidateTree(AbstractDOMStoreTreeChangePublisher.java:65)
    at org.opendaylight.controller.md.sal.dom.store.impl.InMemoryDOMStoreTreeChangePublisher.publishChange(InMemoryDOMStoreTreeChangePublisher.java:77)
  • locked <0x00000000d91738a8> (a org.opendaylight.controller.md.sal.dom.store.impl.InMemoryDOMStoreTreeChangePublisher)
    at org.opendaylight.controller.md.sal.dom.store.impl.InMemoryDOMDataStore.commit(InMemoryDOMDataStore.java:245)
  • locked <0x00000000d91736a0> (a org.opendaylight.controller.md.sal.dom.store.impl.InMemoryDOMDataStore)

(blocked to re-enter demo class)

Which is a classic AB/BA deadlock. close() should not hold a lock (allowing DTCL to freely progress) when invoking registration.close(). It can take a lock prior to that to setup object state so that the DTCL turns into a no-op.

Comment by Robert Varga [ 05/Aug/17 ]

master: https://git.opendaylight.org/gerrit/#/c/60317/

Generated at Wed Feb 07 19:13:46 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.