[CONTROLLER-1689] stopping resolution of shard 0 on stale connection: AskTimeoutException Created: 19/May/17  Updated: 25/Jul/23  Resolved: 22/May/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Vratko Polak Assignee: Robert Varga
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Blocks
blocks CONTROLLER-1665 C: write-transactions does not return... Resolved
Duplicate
is duplicated by CONTROLLER-1690 Frontend client tries to reconnect to... Resolved
External issue ID: 8511

 Description   

This is the karaf.log [0] symptom for robot failure [1] already reported as [2].

Scenario: Module-based shard with tell-based protocol, single writer active on the leader member. Then remove-shard-replica is called, the writer (30s duration) should report success within 180 seconds.

The suspicious log segment:
017-05-19 05:05:34,352 | INFO | lt-dispatcher-31 | EmptyLocalActorRef | 174 - com.typesafe.akka.slf4j - 2.4.17 | Message [org.opendaylight.controller.cluster.access.commands.ConnectClientRequest] from Actor[akka://opendaylight-cluster-data/temp/$g] to Actorakka://opendaylight-cluster-data/user/shardmanager-config/member-2-shard-default-config#-863014827 was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
2017-05-19 05:05:39,370 | INFO | lt-dispatcher-21 | ClientActorBehavior | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.Carbon | member-2-frontend-datastore-config: stopping resolution of shard 0 on stale connection ReconnectingClientConnection{client=ClientIdentifier

{frontend=member-2-frontend-datastore-config, generation=0}

, cookie=0, backend=ShardBackendInfo{actor=Actorakka://opendaylight-cluster-data/user/shardmanager-config/member-2-shard-default-config#-863014827, sessionId=0, version=BORON, maxMessages=1000, cookie=0, shard=default, dataTree present=true}}
akka.pattern.AskTimeoutException: Ask timed out on ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-config/member-2-shard-default-config#-863014827)] after [5000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.access.commands.ConnectClientRequest".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)[173:com.typesafe.akka.actor:2.4.17]
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)[173:com.typesafe.akka.actor:2.4.17]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_121]
2017-05-19 05:05:39,371 | WARN | monPool-worker-3 | AbstractShardBackendResolver | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.Carbon | Failed to resolve shard
akka.pattern.AskTimeoutException: Ask timed out on ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-config/member-2-shard-default-config#-863014827)] after [5000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.access.commands.ConnectClientRequest".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)[173:com.typesafe.akka.actor:2.4.17]
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)[173:com.typesafe.akka.actor:2.4.17]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_121]

[0] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/697/archives/odl2_karaf.log.gz
[1] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/697/archives/log.html.gz#s1-s20-t1-k2-k9
[2] https://bugs.opendaylight.org/show_bug.cgi?id=8403#c3



 Comments   
Comment by Robert Varga [ 19/May/17 ]

Stronger locking and more defensive messages: https://git.opendaylight.org/gerrit/57431

Generated at Wed Feb 07 21:54:13 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.