[NETCONF-220] Getting content from cluster from the second follower causes an "AskTimeoutException" Created: 15/Jun/16  Updated: 15/Mar/19  Resolved: 24/Oct/16

Status: Resolved
Project: netconf
Component/s: netconf
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Ernest Deák Assignee: Rudolf Brisuda
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 6067

 Description   

Attempting to fetch data from the second follower in a 3-node cluster when original owner is down, causes a java exception to be returned in the response text:

<errors xmlns="urn:ietf:params:xml:ns:yang:ietf-restconf"><error><error-type>application</error-type><error-tag>operation-failed</error-tag><error-message>Problem to get data from transaction.</error-message><error-info>akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://opendaylight-cluster-data@10.29.11.243:2550/user/topology-netconf/netconf-test-device/mountpoint#1234942408]] after [5000 ms]
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:599)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:597)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
at java.lang.Thread.run(Thread.java:745)
</error-info></error></errors>

The response code returned is 500.
The URI requested was: /restconf/config/network-topology:network-topology/topology/topology-netconf/node/netconf-test-device/yang-ext:mount

The request is successful when the first follower (the new owner) is querried

Sandbox test results: https://jenkins.opendaylight.org/sandbox/job/netconf-csit-3node-clustering-only-beryllium/3/

Test to look for is in "Entity" suite under the name "Check_Config_Data_Before_Modification_With_Original_Owner_Down"



 Comments   
Comment by Jakub Morvay [ 16/Jun/16 ]

Hi Ernest,

This can be another symptom of NETCONF-187 although with much simpler bug replication steps.

Comment by Rudolf Brisuda [ 05/Sep/16 ]

I am unable to reproduce.
I did these steps:
1. inserting data on the one node,
2. shutting down master node,
3. checking data on the remaining nodes.

At both nodes I got 200 OK with inserted data.

Comment by Vratko Polak [ 10/Oct/16 ]

This is still happening frequently in CSIT. Recent Boron example: Created data is seen on a follower (member-1) [0], but after entity owner (member-2) is killed, data is seen as empty [1].

Compared to the original description, there is no AskTimeoutException visible in the follower karaf.log but there are several warnings visible on the new owner (member-3) karaf.log [2] during the critical 26th minute.

[0] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-clustering-all-boron/263/archives/log.html.gz#s1-s4-t7-k2-k1-k2
[1] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-clustering-all-boron/263/archives/log.html.gz#s1-s4-t10-k2-k2-k2
[2] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-clustering-all-boron/263/archives/odl3_karaf.log.gz

Comment by Rudolf Brisuda [ 24/Oct/16 ]

No longer relevant due to rewriting netconf topology clustering. If problem persists then this bug has different cause and it is not useful. There should be created new bug with proper state.

Generated at Wed Feb 07 20:14:28 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.