[NETCONF-340] After killing device owner, there is a period of 404. Created: 23/Jan/17  Updated: 15/Mar/19  Resolved: 27/Aug/18

Status: Resolved
Project: netconf
Component/s: restconf-nb
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Vratko Polak Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 7661

 Description   

Status 404 means a resource is currently missing, but it is not clear whether this should cover cases when the connection to the device is temporarily down, or whether 503 should be reported for those cases.
Doubly so if the fluctuation of the connectivity is not caused by the device or the network, but by a failed (killed) ODL cluster member.

If 404 is to be expected, it should be documented and the suites updated. The rest of this description assumes Restconf user should not see 404.

This issue is seen in Carbon (both only [0] and all [1]) and Boron (only [2], all is not stable enough), and it might be timing dependent (currently CSIT is spending 10 second in futile wait for Karaf SSH response when logging start of a test case, the timeout is needed to avoid ODLPARENT-49 symptom in the first few test cases).

In each case there is an election for new owner and reconnect happening in the karaf.log [3] [4] [5] but time synchronization is not precise enough to determine the exact time 404 was generated. The Boron log [5] contains long segment (between 2017-01-22 20:05:10,931 and 2017-01-22 20:05:14,521) which seem to be relevant.

Presumably, there is a period of time (before the new owner successfully connects to the device) where status is "connecting", so there is nothing mounted, so 404 may be expected. Except that users have no way to detect both connection status and access the resource at the same instant, so currently they are whether 404 means "it is not there" or "we do not know right now".

[0] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-clustering-only-carbon/399/archives/log.html.gz#s1-s6-t17-k2-k2-k2
[1] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-clustering-all-carbon/160/archives/log.html.gz#s1-s6-t12-k2-k1-k3-k6
[2] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-clustering-only-boron/736/archives/log.html.gz#s1-s6-t17-k2-k2-k2
[3] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-clustering-only-carbon/399/archives/odl1_karaf.log.gz
[4] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-clustering-all-carbon/160/archives/odl3_karaf.log.gz
[5] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-clustering-only-boron/736/archives/odl1_karaf.log.gz



 Comments   
Comment by Vratko Polak [ 24/Jan/17 ]

The two solutions to this Bug which would help with CSIT:

1. Return HTTP status code 503 so that the client know it should retry the request later.

2. Make the Restconf request block while owner is connecting or not elected yet.

There are also hybrid solutions is possible, for example blocking for some time, but returning 503 if the time expires.

I just realized that the owner election is just one of possible reasons why a device could be not mounted, so I have opened [6] as a Restconf Change request.

[6] https://bugs.opendaylight.org/show_bug.cgi?id=7668

Comment by Tomas Cere [ 12/Oct/17 ]

As I said in 7668, there is no way for restconf to distinguish whether the device is reconnecting/temporarily unavailable or just not present at all, since it only has access to MountPointService which only sais whether a MountPoint for a certain Identifier is present or not.
When you kill a device owner, temporary unavailability is expected until the southbound plugin recovers and selects a new leader, but I dont think theres a way to change the status code with the information available to Restconf.

Comment by Robert Varga [ 27/Aug/18 ]

This is an implication of how RPC registration propagate, this works as designed (eventual consistency).

Generated at Wed Feb 07 20:14:45 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.