[CONTROLLER-1678] Futures timed out after short isolation Created: 15/May/17  Updated: 25/Jul/23  Resolved: 18/May/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Vratko Polak Assignee: Tomas Cere
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 8450

 Description   

This is basically CONTROLLER-1660 but this time tell-based protocol (and prefix-based shard) is used.

Less than 12 seconds after the isolation, transaction writer still fails [0].
Errors (without stack traces) are:
Problem accessing /restconf/operations/odl-mdsal-lowlevel-control:produce-transactions. Reason:
<pre> Server Error</pre></p><h3>Caused by:</h3><pre>org.opendaylight.controller.cluster.datastore.exceptions.TimeoutException: Sending message class org.opendaylight.controller.cluster.sharding.messages.ProducerCreated to actor Actorakka://opendaylight-cluster-data/user/ShardedDOMDataTreeFrontend#-1001725550 failed. Try again later.
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [5000 milliseconds]

Looking at karaf.log [1], the only suspicious message is AskTimeoutExcepton on a prefix-based shard, perhaps it was caused by Jolokia request?
2017-05-14 06:29:28,171 | WARN | monPool-worker-2 | AbstractShardBackendResolver | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.Carbon | Failed to resolve shard
akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@10.29.14.191:2550/), Path(/user/shardmanager-config/member-2-shard-id-ints!-config)]] after [5000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.access.commands.ConnectClientRequest".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)[173:com.typesafe.akka.actor:2.4.17]
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)[173:com.typesafe.akka.actor:2.4.17]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_121]

[0] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/684/archives/log.html.gz#s1-s35-t1-k2-k15
[1] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/684/archives/odl3_karaf.log.gz



 Comments   
Comment by Tomas Cere [ 17/May/17 ]

seems like producer was incorrectly closed on another node which seems like can only happen when the initial fillup of the id-ints list failed.

proper close of the producer is a part of: https://git.opendaylight.org/gerrit/#/c/57132/

well see if this emerges again and go from there.

Comment by Vratko Polak [ 18/May/17 ]

> https://git.opendaylight.org/gerrit/#/c/57132/

Merged, and the symptom is not seen since then. Marking as fixed.

Generated at Wed Feb 07 21:54:11 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.