[CONTROLLER-1611] 501 on buy-car from remote member Created: 22/Feb/17  Updated: 25/Jul/23  Resolved: 28/Feb/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Vratko Polak Assignee: Vratko Polak
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 7840

 Description   

Recent Carbon snapshot builds started failing (somewhere between 2017-02-09 and 2017-02-12) in CSIT on Buycar_Failover suite (most recent: [0]).

I believe this [1] change has introduced the failure, because it [2] fails [3] on Sandbox, but its parent [4] change does not [5].

Copied response text:
{"errors":{"error":[{"error-type":"application","error-tag":"operation-not-supported","error-message":"No implementation of RPC AbsoluteSchemaPath

{path=[(urn:opendaylight:params:xml:ns:yang:controller:config:sal-clustering-it:car-purchase?revision=2014-08-18)buy-car]}

available"}]}}

Karaf.log contains nothing (at the default log level).

[0] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/591/archives/log.html.gz#s1-s6-t3-k3-k1-k1-k1-k1-k1-k1-k3-k7-k1
[1] https://git.opendaylight.org/gerrit/50622
[2] https://jenkins.opendaylight.org/sandbox/job/integration-multipatch-test-carbon/9/parameters/
[3] https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/11/archives/log.html.gz#s1-s6-t3-k3-k1-k1-k1-k1-k1-k1-k3-k7-k1
[4] https://jenkins.opendaylight.org/sandbox/job/integration-multipatch-test-carbon/10/parameters/
[5] https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/12/archives/log.html.gz#s1-s6-t3-k3-k1-k1-k1-k1-k1-k1-k3-k7-k1



 Comments   
Comment by Vratko Polak [ 23/Feb/17 ]

Enabling logs showed [6] only one suspicious type of line:

2017-02-23 09:20:56,699 | DEBUG | lt-dispatcher-16 | RpcRegistry | 228 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | akka.tcp://opendaylight-cluster-data@10.29.13.96:2550: receiveUpdateRemoteBuckets: {}

10.29.13.96 is member-2, follower of people shard, add-person was called there.
Still no clue why member-1 has not received the memo.

[6] https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/17/archives/odl2_karaf.log.gz

Comment by Vratko Polak [ 23/Feb/17 ]

I think I understand what happens now. See [7].

Sandbox log [8] created with logging patch [9] shows that member-1 (10.29.12.253) decreases its version from 4294967297 to 0 (because car_outage_corners suite wipes its persisted data):

2017-02-23 17:03:41,122 | TRACE | lt-dispatcher-20 | BucketStoreAccess | 231 - org.opendaylight.controller.sal-remoterpc-connector - 1.5.0.SNAPSHOT | BucketStoreAccess getBucketVersions complete:

{akka.tcp://opendaylight-cluster-data@10.29.13.93:2550=4294967298, akka.tcp://opendaylight-cluster-data@10.29.12.253:2550=4294967297, akka.tcp://opendaylight-cluster-data@10.29.13.80:2550=8589934593}

2017-02-23 17:04:53,181 | TRACE | lt-dispatcher-17 | BucketStoreAccess | 231 - org.opendaylight.controller.sal-remoterpc-connector - 1.5.0.SNAPSHOT | BucketStoreAccess getBucketVersions complete:

{akka.tcp://opendaylight-cluster-data@10.29.12.253:2550=0, akka.tcp://opendaylight-cluster-data@10.29.13.80:2550=8589934593}

As stated in [7], handling of initial incarnations should be reworked.

[7] https://git.opendaylight.org/gerrit/#/c/50622/22/opendaylight/md-sal/sal-remoterpc-connector/src/main/java/org/opendaylight/controller/remote/rpc/registry/gossip/BucketStore.java@169
[8] https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/24/archives/odl1_karaf.log.gz
[9] https://git.opendaylight.org/gerrit/#/c/52165/9

Comment by Vratko Polak [ 23/Feb/17 ]

> car_outage_corners suite wipes its persisted data

Robert told me this is not a good idea, at least without notifying live members. Will update the suites to explicitly "down" the wiped members.

Comment by Vratko Polak [ 24/Feb/17 ]

Seems like "down" is not needed, but ODL is not removing bucket versions (only bucket data) when a member becomes Unreachable.

Carbon fix: https://git.opendaylight.org/gerrit/#/c/52260/2

Comment by Vratko Polak [ 28/Feb/17 ]

> Carbon fix

Merged.

Generated at Wed Feb 07 19:55:59 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.