[OPNFLWPLUG-728] Entity Owner not resolved properly after owner is isolated or rejoined with iptables Created: 08/Jul/16  Updated: 27/Sep/21  Resolved: 26/Oct/16

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Luis Gomez Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File 6177.txt     Text File Bug#6177_Observation.txt    
Issue Links:
Blocks
is blocked by MDSAL-195 ClusterSingletonService is not closed... Resolved
Duplicate
is duplicated by OPNFLWPLUG-755 isolate+rejoin issue Resolved
External issue ID: 6177

 Description   

To reproduce install controller cluster in Be or B and:

1) Connect switch to 3 nodes
2) Find out switch owner
3) isolate owner from other nodes using iptables:
sudo iptables -A OUTPUT -d

{other-controller-IP} -j DROP
sudo iptables -A OUTPUT -d {other-controller-IP}

-j DROP
4) Check entity ownership in cluster node
5) after a while rejoin the cluster:
sudo iptables -D OUTPUT -d

{other-controller-IP} -j DROP
sudo iptables -D OUTPUT -d {other-controller-IP}

-j DROP
6) Check entity ownership in cluster node

Either 4) or 6) fails sporadically.



 Comments   
Comment by Anil Vishnoi [ 08/Jul/16 ]

Hi Luis,

Is this for He plugin or Li plugin or both?

Comment by Luis Gomez [ 08/Jul/16 ]

Both, see the mail I just sent.

Comment by Shuva Jyoti Kar [ 21/Jul/16 ]

Luis , is this issue observed with OF-HA ?

Comment by Shuva Jyoti Kar [ 21/Jul/16 ]

Also Luis, is this issue sporadic or consistent ?

Comment by Luis Gomez [ 21/Jul/16 ]

If you see above, switch connects to the 3 nodes.

It is consistently happening with Li plugin and lately very rare in He plugin.

Comment by Arthi Bhattacharjee [ 29/Jul/16 ]

Hi Luis,

Following are the procedure I have followed for the bug:
• Got the latest beryllium distribution.
• Installed openflowplugin features.
• Added bridge and set controllers to the nodes. [[Control node and two compute nodes]]
[stack@compute2 devstack]$ sudo ovs-vsctl --no-wait add-br br-int
[stack@compute2 devstack]$ sudo ovs-vsctl set bridge br-int protocols=["OpenFlow13"]
[stack@compute2 devstack]$ sudo ovs-vsctl set-controller br-int tcp:10.106.138.154:6653 tcp:10.106.138.155:6653 tcp:10.106.138.110:6653
• Isolate owner from other nodes using iptables.
sudo iptables -A OUTPUT -d 10.106.138.110 -j DROP
sudo iptables -A OUTPUT -d 10.106.138.154 -j DROP
The members are removed from the entity.
• Rejoin the controllers to the cluster.
sudo iptables -D OUTPUT -d 10.106.138.110 -j DROP
sudo iptables -D OUTPUT -d 10.106.138.154 -j DROP
The members are getting added to the entity as candidates.
PFA for the entity owner observation.

Since you has mentioned, this bug is sporadically reproduced, I repeated the tests for 3 times each. (Li and He Plugin).
The members are isolated and rejoined properly from/to the entity list.

Queries:
1. You have mentioned “Either 4) or 6) fails”. Is that fails means, the members are not isolated or rejoined properly from/to the entity list?
2. Should I look at CSIT regarding this bug?

Thanks,
Arthi

Comment by Arthi Bhattacharjee [ 29/Jul/16 ]

Attachment Bug#6177_Observation.txt has been added with description: Entity Owner Observation Document

Comment by Luis Gomez [ 02/Aug/16 ]

OK, lets track the yellow builds on this job (the red are infra issues):

https://jenkins.opendaylight.org/releng/job/openflowplugin-csit-3node-clustering-only-beryllium/

As I commented before, the issue seems to happen very much with the Boron (Li redesign) plugin but not often with Beryllium (He redesign).

Comment by Andrej Leitner [ 09/Aug/16 ]

Hi guys, this should be solved within OPNFLWPLUG-659.
(using cluster singleton service instead of EOS)

Comment by Miroslav Macko [ 25/Aug/16 ]

We have tested https://git.opendaylight.org/gerrit/#/c/44664/ and there are still issues. Jozef will continue working on that. It is not needed to merge it yet.

Thanks,
Miro

Comment by Luis Gomez [ 25/Aug/16 ]

The issue of isolate + rejoin is present in Beryllium -> It is not regression in Boron -> It is not blocker. Still good to fix if we can.

Comment by Andrej Leitner [ 31/Aug/16 ]

Marked as dependent on MDSAL-195 which is probably the real cause.

Comment by Luis Gomez [ 13/Sep/16 ]

After fixing the test, the latest and probably the only issue observed is a change in the owner when an instance is rejoined to the cluster, like for example if a rejoin member 1 that was previously isolated, the device owner changes from 2 to 3. Kind of weird but this is what I see in the test.

Comment by Arthi Bhattacharjee [ 17/Sep/16 ]

Setup:
1 control and 2 compute nodes

Distro: distribution-karaf-0.4.4-20160916.220725-251.tar.gz

I've tested the scenario and observed the ownership is not been changed.
Say,
"candidate":[

{"name":"member-1"}

,

{"name":"member-2"}

,

{"name":"member-3"}

],"owner":"member-1"}

I'm isolating the member-1
sudo iptables -A OUTPUT -d 10.106.138.136 -j DROP

Then member-2 becomes owner.

And while rejoining member-1
sudo iptables -D OUTPUT -d 10.106.138.136 -j DROP

The ownership is not been changed, It's still member-2
I've tested for 3 times. I observed the same result.

Comment by Arthi Bhattacharjee [ 17/Sep/16 ]

Attachment 6177.txt has been added with description: 6177_observation

Comment by Luis Gomez [ 07/Oct/16 ]

So it is been a while I do not look at this issue and I think you are right, I do not see the change of owner anymore but I still see some sporadic failures of isolated member not joining the cluster when iptables rule is deleted:

https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-3node-clustering-only-boron/724/archives/log.html.gz

Comment by Vaclav Demcak [ 13/Oct/16 ]

Manual testing and carbon/boron jenkin's jobs look ok.

Generated at Wed Feb 07 20:33:14 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.