[NETCONF-470] Device access can fail shortly after cluster member is killed Created: 12/Sep/17 Updated: 31/Jan/22 Resolved: 31/Jan/22 |
|
| Status: | Resolved |
| Project: | netconf |
| Component/s: | netconf |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Vratko Polak | Assignee: | Kostiantyn Nosach |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
| External issue ID: | 9148 |
| Description |
|
This manifests as a Robot failure, especially in For example in this [0] failure, post fail with: The real cause is: Looking at karaf.log [1], new leaders were not elected at that time yet, so akka ask is expected to fail. Data broker now supports tell-based protocol, designed to work in such cases. As netconf does not use data broker, it should improve its own code to offer similar functionality, or at least document that accessing mounted devices can randomly fail during cluster HA events. Robot tests can be relaxed (by waiting for new leaders) if Netconf behavior is not going to be improved soon. [0] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-clustering-all-carbon/399/log.html.gz#s1-s9-t7-k2-k1-k1-k4-k7-k1 |
| Comments |
| Comment by Tomas Cere [ 12/Oct/17 ] |
|
For now this is expected, to switch to a tell based protocol we would have to do a rewrite of the singleton again, however we should be able to leverage the tell-based stuff from controller if we switched mountpoints to Shards with a different behavior than the datastore counterparts - nonpersistent, no replication and without actually having a backend that stores data, only forwards it to the device. |