[NETVIRT-1572] Connectivity to VM s lost after ODL are brought up and down in a particular sequence. Created: 13/Mar/19  Updated: 14/Jan/20

Status: Open
Project: netvirt
Component/s: None
Affects Version/s: Magnesium
Fix Version/s: None

Type: Bug Priority: Medium
Reporter: Jaya Priyadarshini Assignee: Srinivas Rachakonda
Resolution: Unresolved Votes: 0
Labels: csit:3node
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Steps to reproduce
===========================
1)Create All Controller Sessions
2)Take Down Leader Of Default Shard and bring it up
3)Create Networks and Create Subnets For net_1 ,Create Subnets For net_2,Add Ssh Allow All Rule
4)Create Vm Instances For net_1
  Create Vm Instances For net_2
  Check Vm Instances Have Ip Address
5)Create Router router_2, router_3 and add interfaces to this router
6) Take Down ODL1
7) Bring up ODL1 and take down ODL2
8) Bring up ODL2 and take down ODL3
9) Bring up ODL3
10) Ping  all Vm Instance1 In net_2 From net_1
  Ping all Vm Instance1 In net_1 From net_2
  Connectivity Tests From Vm Instance1 In net_1 In Healthy Cluster
11) Take Down ODL1 and ODL2
  Connectivity Tests From Vm Instance1 In net_1 With Two ODLs Down
12)Bring Up ODL1 and ODL2

13)Take Down ODL2 and ODL3
  Connectivity Tests From Vm Instance1 In net_2
 
  sudo ip netns exec qdhcp-fe7cf9 b9-1880-4df0-83e5-83a555d591a6 ssh -o MACs=hmac-sha1 -o ConnectTimeout=5 -o Stri ctHostKeyChecking=no cirros@37.0.0.11 -o UserKnownHostsFile=/dev/null
  No match found for 'password:' in 10 seconds Output: ssh: connect to host 37.0.0.11 port 22: Connection timed out [jenkins@sandbox-07429-2-1-devstack-queens-0 ~]> .
 

 

Logs

==================================================

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/builder-copy-sandbox-logs/644

 



 Comments   
Comment by Srinivas Rachakonda [ 18/Dec/19 ]

Test still failing:

 

https://jenkins.opendaylight.org/releng/job/netvirt-csit-1node-0cmb-1ctl-2cmp-openstack-queens-upgrade-snat-conntrack-neon/

Comment by Nishchya Gupta [ 14/Jan/20 ]

Hi Srini,

 

Steps followed in this test case consist of multiple up and down of nodes/shards, below are the number of restart of nodes or shards in this suite.

 

Bring down Default shard leader

Bring up leader for default shard

Down then Up odl1

Down then Up odl2

Down then Up odl3

Down odl1 and odl2

Up odl1 and odl2

Down odl2 and odl3

Up odl2 and odl3

 

But in none of the above case we are verifying that after bring up the nodes/shard, what is the state of shards and who is the owner for the shards.

 

Whereas, in logs I can see below lines multiple times, as per my knowledge which states there might be multiple shards owners are present at the same time.

Even it looks like one of the node is in different cluster and other node is in different cluster, means at the same time 2 clusters are present which is resulting into inconsistency and we are observing multiple failures after that.

2019-12-05T04:47:46,895 | INFO  | opendaylight-cluster-data-akka.actor.default-dispatcher-19 | Cluster(akka://opendaylight-cluster-data) | 47 - com.typesafe.akka.slf4j - 2.5.25 | Cluster Node [akka.tcp://opendaylight-cluster-data@10.30.170.59:2550] - Node [akka.tcp://opendaylight-cluster-data@10.30.170.59:2550] is JOINING itself (with roles [member-1, dc-default]) and forming new cluster2019-12-05T04:47:46,898 | INFO  | opendaylight-cluster-data-akka.actor.default-dispatcher-19 | Cluster(akka://opendaylight-cluster-data) | 47 - com.typesafe.akka.slf4j - 2.5.25 | Cluster Node [akka.tcp://opendaylight-cluster-data@10.30.170.59:2550] - is the new leader among reachable nodes (more leaders may exist) 

It would be better if we change our script and add shard owner details after every restart to know better, that on what restart its actually failing.

And, its good to add someone from clustering/Akka team to look into it as its not having expertise into clustering side.

 

Regards,

Nishchya

Generated at Wed Feb 07 20:24:24 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.