[OPNFLWPLUG-876] Send RPC to non-owner after original owner is killed is not stable in Carbon Created: 07/Apr/17  Updated: 27/Sep/21  Resolved: 22/May/17

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Luis Gomez Assignee: Luis Gomez
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File karaf-1-new-owner.txt     Text File karaf-2-old-owner.txt     Text File karaf-3-candidate-RPC.txt    
External issue ID: 8185

 Description   

As tracked here:

https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-3node-clustering-only-carbon/584/robot/openflowplugin-clustering.txt/Cluster%20HA%20Owner%20Failover/Send%20RPC%20Add%20to%20Successor%20and%20Verify%20After%20Fail/

We get ERROR when trying to push an RPC flow from not-owner instance after original owner is killed:

{"errors":{"error":[

{"error-type":"application","error-tag":"operation-failed","error-message":"The operation encountered an unexpected error while executing.","error-info":"akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://opendaylight-cluster-data@10.29.12.203:2550/user/rpc/broker#-1313599470]] after [15000 ms]. Sender[null] sent message of type \"org.opendaylight.controller.remote.rpc.messages.ExecuteRpc\".\n\tat akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)\n\tat akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)\n\tat scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)\n\tat scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)\n\tat scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)\n\tat akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)\n\tat akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)\n\tat akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)\n\tat akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)\n\tat java.lang.Thread.run(Thread.java:745)\n"}

]}}



 Comments   
Comment by Anil Vishnoi [ 14/Apr/17 ]

Luis, do you see that device gets connected to other controller?

Comment by Luis Gomez [ 14/Apr/17 ]

Yes because same RPC works from new owner:

https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-3node-clustering-only-carbon/584/archives/log.html.gz#s1-s1

The test:

  • kills the owner
  • send RPC from new owner -> OK
  • send RPC from remining candidate -> FAIL

BR/Luis

Comment by Anil Vishnoi [ 14/Apr/17 ]

Does it fail even if you retry multiple times ?

Comment by Luis Gomez [ 14/Apr/17 ]

Add request takes 15 secs to fail, after that there is a delete request taking also 15 secs.

Comment by Luis Gomez [ 21/Apr/17 ]

Attachment karaf-1-new-owner.txt has been added with description: new owner log

Comment by Luis Gomez [ 21/Apr/17 ]

Attachment karaf-2-old-owner.txt has been added with description: old owner log

Comment by Luis Gomez [ 21/Apr/17 ]

Attachment karaf-3-candidate-RPC.txt has been added with description: candidate-RPC-fails

Comment by Luis Gomez [ 21/Apr/17 ]

Uploaded cluster logs. The switch being tested is openflow:1, you can neglect openflow:2 and openflow:3, they are just for discovering topology.

Comment by Luis Gomez [ 25/Apr/17 ]

BTW, same or similar issue is observed in controller cluster test:

https://jenkins.opendaylight.org/releng/view/controller/job/controller-csit-3node-clustering-all-carbon/282/robot/controller-clustering.txt/Global%20Rpc%20Kill/Rpc_On_Remained_Cluster_Nodes/

Comment by Tomas Slusny [ 12/May/17 ]

This should be fixed with this patch by Jozef: https://git.opendaylight.org/gerrit/#/c/56918/

Generated at Wed Feb 07 20:33:37 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.