[CONTROLLER-1656] DistributedShardedDOMDataTreeRemotingTest.testMultipleRegistrationsAtOnePrefix failing intermittently Created: 05/May/17  Updated: 25/Jul/23  Resolved: 11/May/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Tom Pantelis Assignee: Tom Pantelis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 8385

 Description   

DistributedShardedDOMDataTreeRemotingTest.testMultipleRegistrationsAtOnePrefix is failing intermittently in master - seen it twice now. I haven't seen it fail in stable/carbon.

00:49:19 org.opendaylight.controller.cluster.sharding.DOMDataTreeShardCreationFailedException: Unable to create a cds shard.
00:49:19 at org.opendaylight.controller.cluster.sharding.DistributedShardedDOMDataTree$2$2.apply(DistributedShardedDOMDataTree.java:358)
00:49:19 at org.opendaylight.controller.cluster.sharding.DistributedShardedDOMDataTree$2$2.apply(DistributedShardedDOMDataTree.java:355)
00:49:19 at scala.concurrent.Future$$anonfun$transform$1$$anonfun$apply$2.apply(Future.scala:224)
00:49:19 at scala.concurrent.Future$$anonfun$transform$1$$anonfun$apply$2.apply(Future.scala:224)
00:49:19 at scala.util.Try$.apply(Try.scala:192)
00:49:19 at scala.concurrent.Future$$anonfun$transform$1.apply(Future.scala:224)
00:49:19 at scala.concurrent.Future$$anonfun$transform$1.apply(Future.scala:222)
00:49:19 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
00:49:19 at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
00:49:19 at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
00:49:19 at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
00:49:19 at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
00:49:19 at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
00:49:19 at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
00:49:19 at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
00:49:19 at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
00:49:19 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
00:49:19 at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
00:49:19 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
00:49:19 at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
00:49:19 Caused by: org.opendaylight.controller.cluster.sharding.DOMDataTreeShardCreationFailedException: Unable to find the backend shard.Failing..
00:49:19 at org.opendaylight.controller.cluster.sharding.LookupTask.fail(LookupTask.java:50)
00:49:19 at org.opendaylight.controller.cluster.sharding.LookupTask.tryReschedule(LookupTask.java:40)
00:49:19 at org.opendaylight.controller.cluster.sharding.ShardedDataTreeActor$ShardCreationLookupTask$1.onComplete(ShardedDataTreeActor.java:451)
00:49:19 at org.opendaylight.controller.cluster.sharding.ShardedDataTreeActor$ShardCreationLookupTask$1.onComplete(ShardedDataTreeActor.java:447)
00:49:19 at akka.dispatch.OnComplete.internal(Future.scala:258)
00:49:19 at akka.dispatch.OnComplete.internal(Future.scala:256)
00:49:19 at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
00:49:19 at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
00:49:19 ... 13 more
00:49:19 Caused by: org.opendaylight.controller.cluster.datastore.exceptions.LocalShardNotFoundException: Local shard for test! does not exist.
00:49:19 at org.opendaylight.controller.cluster.datastore.utils.ActorContext$3.checkedApply(ActorContext.java:294)
00:49:19 at org.opendaylight.controller.cluster.datastore.utils.ActorContext$3.checkedApply(ActorContext.java:283)
00:49:19 at akka.dispatch.Mapper.apply(Future.scala:359)
00:49:19 at scala.util.Success$$anonfun$map$1.apply(Try.scala:237)
00:49:19 at scala.util.Try$.apply(Try.scala:192)
00:49:19 at scala.util.Success.map(Try.scala:237)
00:49:19 at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
00:49:19 at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
00:49:19 ... 13 more

This is the underlying error:

[ERROR] [05/05/2017 00:47:21.633] [cluster-test-akka.actor.default-dispatcher-4] [akka://cluster-test/user/shardmanager-config] actor name [member-2-shard-test!-config] is not unique!
akka.actor.InvalidActorNameException: actor name [member-2-shard-test!-config] is not unique!
at akka.actor.dungeon.ChildrenContainer$NormalChildrenContainer.reserve(ChildrenContainer.scala:129)
at akka.actor.dungeon.Children$class.reserveChild(Children.scala:130)
at akka.actor.ActorCell.reserveChild(ActorCell.scala:374)
at akka.actor.dungeon.Children$class.makeChild(Children.scala:268)
at akka.actor.dungeon.Children$class.actorOf(Children.scala:42)
at akka.actor.ActorCell.actorOf(ActorCell.scala:374)
at org.opendaylight.controller.cluster.datastore.shardmanager.ShardManager.newShardActor(ShardManager.java:1067)
at org.opendaylight.controller.cluster.datastore.shardmanager.ShardManager.doCreatePrefixShard(ShardManager.java:574)
at org.opendaylight.controller.cluster.datastore.shardmanager.ShardManager.onPrefixShardCreated(ShardManager.java:551)
at org.opendaylight.controller.cluster.datastore.shardmanager.ShardManager.handleCommand(ShardManager.java:258)
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:31)

The test repeats 10 times creating/destroying the "test" shard. It failed on the second round (Round 1). It seem the "test" shard actor from the prior round hadn't been fully destroyed.



 Comments   
Comment by Tom Pantelis [ 05/May/17 ]

Submitted https://git.opendaylight.org/gerrit/#/c/56590/

Comment by Jakub Morvay [ 08/May/17 ]

Even with https://git.opendaylight.org/gerrit/#/c/56590/ we can see intermittent failures in DistributedShardedDOMDataTreeRemotingTest.testMultipleRegistrationsAtOnePrefix test.

This is from https://jenkins.opendaylight.org/releng/job/autorelease-release-nitrogen/12/console:

03:29:56 testMultipleRegistrationsAtOnePrefix(org.opendaylight.controller.cluster.sharding.DistributedShardedDOMDataTreeRemotingTest) Time elapsed: 18.911 sec <<< FAILURE!
03:29:56 java.lang.AssertionError
03:29:56 at org.junit.Assert.fail(Assert.java:86)
03:29:56 at org.junit.Assert.assertTrue(Assert.java:41)
03:29:56 at org.junit.Assert.assertNotNull(Assert.java:621)
03:29:56 at org.junit.Assert.assertNotNull(Assert.java:631)
03:29:56 at org.opendaylight.controller.cluster.sharding.DistributedShardedDOMDataTreeRemotingTest.testMultipleRegistrationsAtOnePrefix(DistributedShardedDOMDataTreeRemotingTest.java:401)

Created patch https://git.opendaylight.org/gerrit/#/c/56658/ to ignore this test for now.

Comment by Tom Pantelis [ 09/May/17 ]

I found the issue (actually) and will push a patch soon.

Comment by Tom Pantelis [ 09/May/17 ]

Submitted https://git.opendaylight.org/gerrit/#/c/56722/

Generated at Wed Feb 07 19:56:06 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.