[NETCONF-458] Remount attempt in Oxygen three node cluster results in unsuccessful initial read issued by vpp renderer Created: 24/Aug/17  Updated: 15/Mar/19  Resolved: 28/Sep/17

Status: Resolved
Project: netconf
Component/s: netconf
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Juraj Linkeš Assignee: Tomas Cere
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Linux
Platform: PC


Attachments: Zip Archive logs.zip    
External issue ID: 9048

 Description   

Environment: three oxygen nodes in cluster on Centos

These are the steps I'm doing:

1. mount 5 devices into /config/network-topology:network-topology/topology/topology-netconf/ with this configuration:

{ "netconf-node-topology:between-attempts-timeout-millis": 10000, "netconf-node-topology:connection-timeout-millis": 10000, "netconf-node-topology:default-request-timeout-millis": 10000, "netconf-node-topology:host": "192.168.10.26", "netconf-node-topology:keepalive-delay": 0, "netconf-node-topology:max-connection-attempts": 10, "netconf-node-topology:password": "admin", "netconf-node-topology:port": 2831, "netconf-node-topology:tcp-only": false, "netconf-node-topology:username": "admin", "node-id": "overcloud-novacompute-0.opnfv.org" }

this mount works fine, vpp renderer reads interfaces successfully
2. delete these five mounts
3. remount the 5 devices the same way and vpp renderer fails to read interface configuration from any of the devices

Logs attached



 Comments   
Comment by Juraj Linkeš [ 24/Aug/17 ]

Attachment logs.zip has been added with description: logs from all three controllers

Comment by Jakub Morvay [ 06/Sep/17 ]

From the logs it seems that the remount attempt succeeded. The problem is that right after devices are mounted, application is trying to write some invalid data to the devices. This causes the fail of ReadWriteTransactionActor and the mountpoints are afterwards unusable.

Comment by Tomas Cere [ 26/Sep/17 ]

The device only provides this revision of interface-nat model:
interface-nat, namespace=urn:opendaylight:params:xml:ns:yang:interface:nat, revision=2017-08-01

while vpp/whomever is trying to write data with newer revision:
(urn:opendaylight:params:xml:ns:yang:interface:nat?revision=2017-08-16)nat]

This needs to be fixed either on the writer side to use the revision thats provided by the device or the device needs to have this newer revision.
Although the behavior on netconf side in regards to this bad behavior could be improved.

Comment by Tomas Cere [ 27/Sep/17 ]

After further digging this is a bug in gbp.

The data broker is cached here:
https://github.com/opendaylight/groupbasedpolicy/blob/c785ab9c27e42f02f500cc9ce54915a7aec7ed1c/renderers/vpp/src/main/java/org/opendaylight/groupbasedpolicy/renderer/vpp/util/MountedDataBrokerProvider.java#L53-L55

Which has a reference to an actorRef which changes after the device connector is restarted. However this reference is never cleared and so this broker is trying to write to an already terminated actor hence the transaction fails.

This is also confirmed by this log:
2017-09-26 12:53:37,909 | DEBUG | on-dispatcher-38 | MountedDataBrokerProvider | 277 - org.opendaylight.groupbasedpolicy.vpp-renderer - 0.7.0.SNAPSHOT | Node connected, mountpoint with iid KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.network.topology.topology.Node, path=[org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.NetworkTopology, org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.network.topology.Topology[key=TopologyKey [_topologyId=Uri [_value=topology-netconf]]], org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.network.topology.topology.Node[key=NodeKey [_nodeId=Uri [_value=overcloud-novacompute-0.opnfv.org]]]]}

available

which happens when only when gbp tries to retrieve the databroker and only happens when the first connector is created.

Comment by Michal Cmarada [ 27/Sep/17 ]

Fix in GBP:
https://git.opendaylight.org/gerrit/#/c/63619/

Generated at Wed Feb 07 20:15:05 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.