[NETCONF-186] Clustering: Device not mounted on node after the node reboots Created: 19/Apr/16  Updated: 15/Mar/19  Resolved: 24/Oct/16

Status: Resolved
Project: netconf
Component/s: netconf
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Jozef Behran Assignee: Rudolf Brisuda
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 5761

 Description   

1. Setup a 3-node cluster.
2. Configure a Netconf device and wait for it to connect (testool simulated one is fine).
3. Kill -9 the ODL.
4. Now when you perform an operation on the device (tested to create a piece of data) the two nodes that remain will work just fine.
5. After the dead node is brought back up, it fails to mount the device (any access to the data part of the netconf connector will result in "Mount point does not exist) but the device itself is still present in the topology.

See https://git.opendaylight.org/gerrit/37335/ for a suite that performs this test.



 Comments   
Comment by Vratko Polak [ 06/May/16 ]

> it fails to mount the device

Testtool logs confirm the new session was established. INFO level of karaf.log does not show why the mount operation failed, here is a complete segment:

2016-05-03 12:09:02,449 | INFO | lt-dispatcher-40 | BaseTopologyManager | 241 - org.opendaylight.netconf.abstract-topology - 1.0.1.Beryllium-SR1 | Member is Up: Member(address
= akka.tcp://opendaylight-cluster-data@10.30.32.233:2550, status = Up)
2016-05-03 12:09:02,449 | INFO | lt-dispatcher-16 | ShardManager | 173 - org.opendaylight.controller.sal-distributed-datastore - 1.3.1.Beryllium-SR1 | updatePeerAddress
for peer member-1-shard-inventory-operational with address akka.tcp://opendaylight-cluster-data@10.30.32.233:2550/user/shardmanager-operational/member-1-shard-inventory-operational
2016-05-03 12:09:02,449 | INFO | lt-dispatcher-16 | ShardManager | 173 - org.opendaylight.controller.sal-distributed-datastore - 1.3.1.Beryllium-SR1 | updatePeerAddress
for peer member-1-shard-entity-ownership-operational with address akka.tcp://opendaylight-cluster-data@10.30.32.233:2550/user/shardmanager-operational/member-1-shard-entity-ownership-operati
onal
2016-05-03 12:09:02,449 | INFO | lt-dispatcher-16 | ShardManager | 173 - org.opendaylight.controller.sal-distributed-datastore - 1.3.1.Beryllium-SR1 | updatePeerAddress
for peer member-1-shard-topology-operational with address akka.tcp://opendaylight-cluster-data@10.30.32.233:2550/user/shardmanager-operational/member-1-shard-topology-operational
2016-05-03 12:09:02,449 | INFO | lt-dispatcher-16 | ShardManager | 173 - org.opendaylight.controller.sal-distributed-datastore - 1.3.1.Beryllium-SR1 | updatePeerAddress for peer member-1-shard-toaster-operational with address akka.tcp://opendaylight-cluster-data@10.30.32.233:2550/user/shardmanager-operational/member-1-shard-toaster-operational
2016-05-03 12:09:02,449 | INFO | lt-dispatcher-16 | ShardManager | 173 - org.opendaylight.controller.sal-distributed-datastore - 1.3.1.Beryllium-SR1 | shard-manager-operational: All Shards are ready - data store operational is ready, available count is 0
2016-05-03 12:09:02,470 | INFO | NioProcessor-2 | ClientUserAuthServiceNew | 97 - org.apache.sshd.core - 0.14.0 | Received SSH_MSG_USERAUTH_FAILURE
2016-05-03 12:09:02,474 | INFO | NioProcessor-2 | UserAuthKeyboardInteractive | 97 - org.apache.sshd.core - 0.14.0 | Received Password authentication en-US
2016-05-03 12:09:02,475 | INFO | NioProcessor-2 | ClientUserAuthServiceNew | 97 - org.apache.sshd.core - 0.14.0 | Received SSH_MSG_USERAUTH_SUCCESS
2016-05-03 12:09:02,497 | WARN | lt-dispatcher-59 | BaseTopologyManager | 241 - org.opendaylight.netconf.abstract-topology - 1.0.1.Beryllium-SR1 | Received a custom identify message from: akka.tcp://opendaylight-cluster-data@10.30.32.98:2550
2016-05-03 12:09:02,720 | WARN | oupCloseable-6-1 | ClusteredNetconfDevice | 240 - org.opendaylight.netconf.topology - 1.0.1.Beryllium-SR1 | Node netconf-test-device SessionUp, with capabilities NetconfSessionPreferences

{capabilities=[urn:ietf:params:netconf:capability:exi:1.0, urn:ietf:params:netconf:capability:candidate:1.0, urn:ietf:params:netconf:base:1.0, urn:ietf:params:netconf:base:1.1], moduleBasedCapabilities=[(urn:opendaylight:params:xml:ns:yang:controller:config:sal-clustering-it:car-people?revision=2014-08-18)car-people, (urn:opendaylight:yang:extension:yang-ext?revision=2013-07-09)yang-ext, (urn:opendaylight:params:xml:ns:yang:controller:config:sal-clustering-it:people?revision=2014-08-18)people, (urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring?revision=2010-10-04)ietf-netconf-monitoring, (urn:opendaylight:test:netconf:crud?revision=2014-10-18)test, (urn:ietf:params:xml:ns:yang:ietf-yang-types?revision=2010-09-24)ietf-yang-types, (urn:opendaylight:params:xml:ns:yang:controller:config:sal-clustering-it:car?revision=2014-08-18)car, (urn:ietf:params:xml:ns:yang:ietf-inet-types?revision=2010-09-24)ietf-inet-types, (urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring-extension?revision=2013-12-10)ietf-netconf-monitoring-extension, (urn:opendaylight:params:xml:ns:yang:controller:config:sal-clustering-it:car-purchase?revision=2014-08-18)car-purchase], rollback=false, monitoring=true, candidate=true, writableRunning=false}

2016-05-03 12:09:02,952 | INFO | lt-dispatcher-17 | NetconfDevice | 238 - org.opendaylight.netconf.sal-netconf-connector - 1.3.1.Beryllium-SR1 | RemoteDevice

{netconf-test-device}

: Netconf connector initialized successfully
2016-05-03 12:09:06,476 | WARN | lt-dispatcher-48 | BaseTopologyManager | 241 - org.opendaylight.netconf.abstract-topology - 1.0.1.Beryllium-SR1 | Received a custom identify message from: akka.tcp://opendaylight-cluster-data@10.30.32.39:2550
2016-05-03 12:09:12,434 | INFO | 4]-nio2-thread-1 | ServerSession | 28 - org.apache.sshd.core - 0.12.0 | Server session created from /10.30.32.234:54042
2016-05-03 12:09:12,443 | INFO | 4]nio2-thread-1 | ServerSession | 28 - org.apache.sshd.core - 0.12.0 | Kex: server>client aes128-ctr hmac-sha1 none
2016-05-03 12:09:12,443 | INFO | 4]nio2-thread-1 | ServerSession | 28 - org.apache.sshd.core - 0.12.0 | Kex: client>server aes128-ctr hmac-sha1 none
2016-05-03 12:09:12,508 | INFO | 4]-nio2-thread-3 | ServerUserAuthService | 28 - org.apache.sshd.core - 0.12.0 | Session karaf@/10.30.32.234:54042 authenticated
2016-05-03 12:09:12,707 | INFO | h for user karaf | command | 45 - org.apache.karaf.log.command - 3.0.3 | ROBOT MESSAGE: Starting test Check_New_Device_Data_Is_Visible_On_node1
2016-05-03 12:10:53,783 | INFO | h for user karaf | command | 45 - org.apache.karaf.log.command - 3.0.3 | ROBOT MESSAGE: Starting test Kill_node2_Before_Modify

Comment by Vratko Polak [ 06/May/16 ]

(I forgot to mention this was SR1 build, recent Beryllium versions may behave differently.)

Interestingly, in the same suite, the same node manages to perform the mount when it gains ownership of the device entity:

2016-05-03 12:14:25,236 | INFO | h for user karaf | command | 45 - org.apache.karaf.log.command - 3.0.3 | ROBOT MESSAGE: Starting test Restart_node3_After_Delete_An
d_Dump_Its_Topology_Data
2016-05-03 12:14:25,241 | WARN | lt-dispatcher-39 | TopologyRoleChangeStrategy | 241 - org.opendaylight.netconf.abstract-topology - 1.0.1.Beryllium-SR1 | Gained ownership of entity, r
egistering datastore listener
2016-05-03 12:14:25,248 | INFO | lt-dispatcher-39 | kka://opendaylight-cluster-data) | 159 - com.typesafe.akka.slf4j - 2.3.14 | Cluster Node [akka.tcp://opendaylight-cluster-data@10.30.32.2
33:2550] - Trying to join [akka.tcp://opendaylight-cluster-data@10.30.32.233:2550] when already part of a cluster, ignoring
2016-05-03 12:14:25,254 | WARN | lt-dispatcher-38 | BaseTopologyManager | 241 - org.opendaylight.netconf.abstract-topology - 1.0.1.Beryllium-SR1 | NodeUri [_value=netconf-test-
device] already exists, triggering update..
2016-05-03 12:14:25,258 | WARN | lt-dispatcher-20 | NetconfDeviceCommunicator | 238 - org.opendaylight.netconf.sal-netconf-connector - 1.3.1.Beryllium-SR1 | RemoteDevice

{netconf-test -device}

: Session terminated Session closed
2016-05-03 12:14:25,270 | WARN | lt-dispatcher-38 | LoggingSalNodeWriter | 241 - org.opendaylight.netconf.abstract-topology - 1.0.1.Beryllium-SR1 | Delete recieved
2016-05-03 12:14:25,271 | WARN | lt-dispatcher-38 | LoggingSalNodeWriter | 241 - org.opendaylight.netconf.abstract-topology - 1.0.1.Beryllium-SR1 | NodeId: netconf-test-device
2016-05-03 12:14:25,272 | WARN | lt-dispatcher-39 | BaseTopologyManager | 241 - org.opendaylight.netconf.abstract-topology - 1.0.1.Beryllium-SR1 | Delete part of update succesf
ull, triggering create
2016-05-03 12:14:25,275 | WARN | lt-dispatcher-18 | LoggingSalNodeWriter | 241 - org.opendaylight.netconf.abstract-topology - 1.0.1.Beryllium-SR1 | Init recieved
2016-05-03 12:14:25,275 | WARN | lt-dispatcher-18 | LoggingSalNodeWriter | 241 - org.opendaylight.netconf.abstract-topology - 1.0.1.Beryllium-SR1 | NodeId: netconf-test-device
2016-05-03 12:14:25,275 | WARN | lt-dispatcher-18 | LoggingSalNodeWriter | 241 - org.opendaylight.netconf.abstract-topology - 1.0.1.Beryllium-SR1 | Node: Node [_key=NodeKey [_no
deId=Uri [_value=netconf-test-device]], _nodeId=Uri [_value=netconf-test-device], augmentation=[NetconfNode [_availableCapabilities=AvailableCapabilities [_availableCapability=[], augmentati
on=[]], _clusteredConnectionStatus=ClusteredConnectionStatus [_nodeStatus=[NodeStatus [_node=akka.tcp://opendaylight-cluster-data@10.30.32.233:2550, _status=Unavailable, augmentation=[]]], a
ugmentation=[]], _connectionStatus=Connecting, _host=Host [_ipAddress=IpAddress [_ipv4Address=Ipv4Address [_value=10.30.32.148], _value=[1, 0, ., 3, 0, ., 3, 2, ., 1, 4, 8]], _value=[1, 0, .
, 3, 0, ., 3, 2, ., 1, 4, 8]], _port=PortNumber [_value=17830], _unavailableCapabilities=UnavailableCapabilities [_unavailableCapability=[], augmentation=[]]]]]
2016-05-03 12:14:25,282 | INFO | lt-dispatcher-38 | AbstractNetconfTopology | 240 - org.opendaylight.netconf.topology - 1.0.1.Beryllium-SR1 | Connecting RemoteDevice

{Uri [_value=ne tconf-test-device]}

, with config Node{getNodeId=Uri [_value=netconf-test-device], augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.netconf.node.topology.rev150114.Netc
onfNode=NetconfNode{getBetweenAttemptsTimeoutMillis=2000, getConnectionTimeoutMillis=20000, getCredentials=LoginPassword{getPassword=topsecret, getUsername=admin, augmentations={}}, getDefau
ltRequestTimeoutMillis=60000, getHost=Host [_ipAddress=IpAddress [_ipv4Address=Ipv4Address [_value=10.30.32.148], _value=[1, 0, ., 3, 0, ., 3, 2, ., 1, 4, 8]], _value=[1, 0, ., 3, 0, ., 3, 2
, ., 1, 4, 8]], getKeepaliveDelay=0, getMaxConnectionAttempts=0, getPort=PortNumber [_value=17830], getSchemaCacheDirectory=schema, getSleepFactor=1.5, isReconnectOnChangedSchema=false, isTc
pOnly=false}}}

Comment by Rudolf Brisuda [ 05/Sep/16 ]

https://git.opendaylight.org/gerrit/#/c/45147/

Comment by Rudolf Brisuda [ 24/Oct/16 ]

Fixed with new netconf clustering topology implementation: https://git.opendaylight.org/gerrit/#/c/46660/

Generated at Wed Feb 07 20:14:23 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.