[CONTROLLER-1518] DataChangeListener is not notified of DS change occasionally Created: 18/May/16  Updated: 19/Oct/17  Resolved: 15/Jun/16

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: 0.4.0
Fix Version/s: None

Type: Bug
Reporter: Hideyuki Tai Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 5913

 Description   

We have observed that DataChangeListener was not notified of data change occasionally.
This issue happens on the master branch (for Boron development).

When the issue happened, we observed the following exception was caused during DS update.

2016-05-16 15:17:21,371 | ERROR | lt-dispatcher-32 | OneForOneStrategy | 184 - com.typesafe.akka.slf4j - 2.4.4 | This stopwatch is already running.
java.lang.IllegalStateException: This stopwatch is already running.
at com.google.common.base.Preconditions.checkState(Preconditions.java:173)[83:com.google.guava:18.0.0]
at com.google.common.base.Stopwatch.start(Stopwatch.java:160)[83:com.google.guava:18.0.0]
at org.opendaylight.controller.cluster.datastore.DefaultShardDataChangeListenerPublisher.publishChanges(DefaultShardDataChangeListenerPublisher.java:59)[199:org.opendaylight.controller.sal-distributed-datastore:1.4.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.ShardDataTreeNotificationPublisherActor$PublishNotifications.publish(ShardDataTreeNotificationPublisherActor.java:46)[199:org.opendaylight.controller.sal-distributed-datastore:1.4.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.ShardDataTreeNotificationPublisherActor$PublishNotifications.access$000(ShardDataTreeNotificationPublisherActor.java:33)[199:org.opendaylight.controller.sal-distributed-datastore:1.4.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.ShardDataTreeNotificationPublisherActor.handleReceive(ShardDataTreeNotificationPublisherActor.java:25)[199:org.opendaylight.controller.sal-distributed-datastore:1.4.0.SNAPSHOT]
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedActor.onReceive(AbstractUntypedActor.java:25)[195:org.opendaylight.controller.sal-clustering-commons:1.4.0.SNAPSHOT]
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)[183:com.typesafe.akka.actor:2.4.4]
at akka.actor.Actor$class.aroundReceive(Actor.scala:482)[183:com.typesafe.akka.actor:2.4.4]
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)[183:com.typesafe.akka.actor:2.4.4]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)[183:com.typesafe.akka.actor:2.4.4]
at akka.actor.ActorCell.invoke(ActorCell.scala:495)[183:com.typesafe.akka.actor:2.4.4]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)[183:com.typesafe.akka.actor:2.4.4]
at akka.dispatch.Mailbox.run(Mailbox.scala:224)[183:com.typesafe.akka.actor:2.4.4]
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[183:com.typesafe.akka.actor:2.4.4]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[179:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[179:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[179:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[179:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]

Due to this issue, IT for VTN Manager on the master branch fails occasionally.

https://jenkins.opendaylight.org/releng/view/vtn/job/vtn-verify-boron/138/
https://jenkins.opendaylight.org/releng/view/vtn/job/vtn-verify-boron/129/



 Comments   
Comment by Hideyuki Tai [ 18/May/16 ]

We've investigated this issue.
We are thinking that the root cause of this issue is in the sal-distributed-datastore.

When VTN project used sal-inmemory-datastore as the backend of DS, VTN project has never faced this issue. However, after VTN project migrated it's IT into mdsal-it-base, the VTN IT uses the sal-distributed-datastore, and the issue has showed up.

What happened when the IllegalStateException occurred?
In the failure case, the publishChanges() of DefaultShardDataChangeListenerPublisher class was called from multiple threads at the same time.
Then, Stopwatch.start() was called even though the stopwatch was already started on a different thread for the same instance of Stopwatch class.

[opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/DefaultShardDataChangeListenerPublisher.java]

33 final class DefaultShardDataChangeListenerPublisher implements ShardDataChangeListenerPublisher,
34 NotificationManager<DataChangeListenerRegistration<?>, DOMImmutableDataChangeEvent> {
(snip)

38 private final Stopwatch timer = Stopwatch.createUnstarted();

(snip)

57 @Override
58 public void publishChanges(DataTreeCandidate candidate, String logContext) {
59 timer.start();
60
61 try

{ 62 ResolveDataChangeEventsTask.create(candidate, dataChangeListenerTree).resolve(this); 63 }

finally {
64 timer.stop();
65 long elapsedTime = timer.elapsed(TimeUnit.MILLISECONDS);
66 if(elapsedTime >= PUBLISH_DELAY_THRESHOLD_IN_MS) {
67 LOG.warn("{}: Generation of DataChange events took longer than expected. Elapsed time: {}",
68 logContext, timer);
69 } else {
70 LOG.debug("{}: Elapsed time for generation of DataChange events: {}", logContext, timer);
71 }
72
73 timer.reset();
74 }
75 }

Here, we are talking about DefaultShardDataChangeListenerPublisher class so far.
However, we think the DefaultShardDataTreeChangeListenerPublisher has also the same issue.

Comment by Hideyuki Tai [ 12/Jun/16 ]

IT for VTN Manager on the master branch still failed on 06/11/2016 due to the CONTROLLER-1518.
https://jenkins.opendaylight.org/releng/job/vtn-integration-boron/896/org.opendaylight.vtn$manager.it.core/console

Comment by Tom Pantelis [ 14/Jun/16 ]

Submitted https://git.opendaylight.org/gerrit/#/c/40306/

Comment by Tom Pantelis [ 15/Jun/16 ]

Be patch: https://git.opendaylight.org/gerrit/#/c/40324/

Generated at Wed Feb 07 19:55:45 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.