[CONTROLLER-1633] Singleton global rpc timed out with akka.pattern.AskTimeoutException for 15000ms after leader kill Created: 11/Apr/17 Updated: 25/Jul/23 Resolved: 02/May/17 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Peter Gubka | Assignee: | Unassigned |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
| External issue ID: | 8207 |
| Description |
|
Suite: Global Rpc Kill Steps: {"errors":{"error":[ {"error-type":"application","error-tag":"operation-failed","error-message":"The operation encountered an unexpected error while executing.","error-info":"akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://opendaylight-cluster-data@10.29.12.23:2550/user/rpc/broker#526563179]] after [15000 ms]. Sender[null] sent message of type "org.opendaylight.controller.remote.rpc.messages.ExecuteRpc". at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604) at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126) at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601) at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329) at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280) at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284) at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236) at java.lang.Thread.run(Thread.java:745) "}]}} |
| Comments |
| Comment by Robert Varga [ 12/Apr/17 ] |
|
Unfortunately node2 logs are not showing much. Two questions:
|
| Comment by Peter Gubka [ 12/Apr/17 ] |
|
(In reply to Robert Varga from comment #1) karaf.log will be attached, link to the job is https://jenkins.opendaylight.org/sandbox/job/controller-csit-3node-clustering-only-carbon/3/ |
| Comment by Peter Gubka [ 12/Apr/17 ] |
|
Attachment odl1_karaf.log.gz has been added with description: karaf log 1 |
| Comment by Peter Gubka [ 12/Apr/17 ] |
|
Attachment odl2_karaf.log.gz has been added with description: karaf log 2 |
| Comment by Peter Gubka [ 12/Apr/17 ] |
|
Attachment odl3_karaf.log.gz has been added with description: karaf log 3 |
| Comment by Peter Gubka [ 20/Apr/17 ] |
|
I repeated the test again. I tried rpc again after the first attempt failed (with timeout). The second attempt was done at least 15s seconds after the new leader elected due to the first attempt duration. > - does this occur repeatedly for more than ~2 seconds after the new leader The response was the same and the second attempt failed with the same exceptions 15000ms timeout exception. Logs from all 3 nodes attached. Robot's log.html is too big. |
| Comment by Peter Gubka [ 20/Apr/17 ] |
|
Attachment odl_logs.zip has been added with description: logs with retied rpc |
| Comment by Robert Varga [ 27/Apr/17 ] |
|
Those logs do not contain remote-rpc debugs, we need those to understand what is going on. |
| Comment by Peter Gubka [ 27/Apr/17 ] |
|
(In reply to Robert Varga from comment #8) https://jenkins.opendaylight.org/sandbox/job/controller-csit-3node-clustering-only-carbon/19/ Should i attach the logs or the link is enough? |
| Comment by Peter Gubka [ 28/Apr/17 ] |
|
Adding files from https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/19/ |
| Comment by Peter Gubka [ 28/Apr/17 ] |
|
Attachment with-remote-rpc-debug.tar has been added with description: log with debug level |
| Comment by Peter Gubka [ 02/May/17 ] |
|
https://git.opendaylight.org/gerrit/56393 This is an update on test suite side. Waiting for new owner is not enough. There must be akka notification for node to update the rpc reference to a new node. It is the reason why it always failed on follower nodes. WUKS added, to wait for akka. |