[CONTROLLER-1592] DistributedEntityOwnershipIntegrationTest.testFunctionalityWithThreeNodes is unstable (flaky) test Created: 07/Feb/17  Updated: 25/Jul/23  Resolved: 20/Feb/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Michael Vorburger Assignee: Tom Pantelis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 7746

 Description   

CONTROLLER-1578's https://git.opendaylight.org/gerrit/#/c/51486/'s https://jenkins.opendaylight.org/releng/job/controller-verify-carbon-mvn33-openjdk8/1791/console failed with:

00:36:54 DistributedEntityOwnershipIntegrationTest.testFunctionalityWithThreeNodes:201
00:36:54 follower2MockListener.ownershipChanged(
00:36:54 <DOMEntityOwnershipChange [entity=DOMEntity [type=entityType1, id=/(urn:opendaylight:params:xml:ns:yang:mdsal:core:general-entity?revision=2015-09-30)entity/entity[

{(urn:opendaylight:params:xml:ns:yang:mdsal:core:general-entity?revision=2015-09-30)name=entity2}

]], state=REMOTE_OWNERSHIP_CHANGED [wasOwner=false, isOwner=false, hasOwner=true], inJeopardy=false]>

00:34:05 Wanted 1 time:
00:34:05 -> at org.opendaylight.controller.cluster.datastore.entityownership.DistributedEntityOwnershipIntegrationTest.testFunctionalityWithThreeNodes(DistributedEntityOwnershipIntegrationTest.java:201)
00:34:05 But was 2 times. Undesired invocation:
00:34:05 -> at org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipListenerActor.onEntityOwnershipChanged(EntityOwnershipListenerActor.java:44)

I highly doubt that the change made for CONTROLLER-1578 in c/51486 can cause this, as e.g. the DataBrokerTestCustomizer and the AbstractDataBrokerTest do not appear to be used at all by this DistributedEntityOwnershipIntegrationTest.

I tried to have a look locally, and saw it fail once on the branch of c/51486, and then on relaunch locally pass again... and couldn't get it to fail again. Using my RunUntilFailureRule from infrautils, I ran it about 27 times on both master and the branch of c/51486, and could never get it to fail again locally. So it appears to be a rare race condition or something ("heisenbug" ? and we should consider DistributedEntityOwnershipIntegrationTest.testFunctionalityWithThreeNodes to be an occasionally unstable (flaky) test...

No action for the moment, but if we see it again, I can @Ignore it, or someone else could see if they can fix it?

_____
Here's how to use my RunUntilFailureRule from infrautils:

<dependency>
<groupId>org.opendaylight.infrautils</groupId>
<artifactId>infrautils-testutils</artifactId>
<version>1.1.0-SNAPSHOT</version>
<scope>test</scope>
</dependency>

public class DistributedEntityOwnershipIntegrationTest {

public static @ClassRule RunUntilFailureClassRule classRepeater = new RunUntilFailureClassRule();
public @Rule RunUntilFailureRule repeater = new RunUntilFailureRule(classRepeater);

....



 Comments   
Comment by Tomas Cere [ 08/Feb/17 ]

Encountered it in other patches aswell but i wasn't able to reproduce it locally either. Here for example: https://git.opendaylight.org/gerrit/#/c/50622/

Comment by Tom Pantelis [ 16/Feb/17 ]

I was able to reproduce the failure after 36 runs. There's a timing issue with the test but the failure also revealed that EntityOwnershipListenerSupport needs to be thread-safe, i.e. access to the listenerActorMap and entityTypeListenerMap needs to be synchronized. Not sure how this slipped thru the cracks before...

I pushed https://git.opendaylight.org/gerrit/#/c/51941. After the changes it ran successfully 100 times.

BTW - I have a CLI script that runs "mvn test ..." continuously until failure, similar to RunUntilFailureRule, but it's nice b/c you don't have to temporarily change code.

Generated at Wed Feb 07 19:55:56 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.