[OPNFLWPLUG-748] Switch is still seen after owner goes down in non OFHA scenario Created: 18/Aug/16  Updated: 27/Sep/21  Resolved: 16/May/17

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Luis Gomez Assignee: Luis Gomez
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Duplicate
is duplicated by OPNFLWPLUG-757 Flow installation in OVS switch cause... Resolved
External issue ID: 6459

 Description   

This issue is tracked here:

https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-daily-only-boron/

Similar issue was resolved for He plugin:

https://bugs.opendaylight.org/show_bug.cgi?id=6058

BR/Luis



 Comments   
Comment by Jozef Bacigal [ 18/Aug/16 ]

Just wondering / asking is FMR reconciliation design dependent ?

Comment by Shuva Jyoti Kar [ 18/Aug/16 ]

(In reply to Jozef Bacigal from comment #1)
> Just wondering / asking is FMR reconciliation design dependent ?

What do you mean by design-dependent? whether its linked to He/Li plugin design ? nopes . the moment we get a node connected we check if i am the master and then push the configuration.While i debug i can see it happening , so whether its an heisen i am not sure

Comment by Shuva Jyoti Kar [ 19/Aug/16 ]

the issue with 6058 was that the EOS was having a stale entry, so reconciliation wasnot getting triggered since the DTCL was not fired after a node disconnect. We need to check if that is the case

Comment by Tomas Slusny [ 20/Sep/16 ]

I saw that there are a lot of NPEs in logs. It was because DeviceMasterShipManager was created after we was registering forwarders in FRM, so there was possibility, that DeviceMasterShipManager will be null when we received DataTreeChange event and we wanted to check if node is owner or not.

Here is link to Gerrit: https://git.opendaylight.org/gerrit/#/c/45877

Comment by Luis Gomez [ 20/Sep/16 ]

My observation on this bug is that controller does not support restart of the device owner when switches only connect to the owner. Entity owner, inventory and topology API freeze and never leave this state after owner goes down.

Comment by Vratko Polak [ 10/Feb/17 ]

> This issue is tracked here:

I see a reconciliation suite passing [0]. Where should are look more specifically?

[0] https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-3node-periodic-bulkomatic-clustering-daily-only-boron/237/archives/log.html.gz#s1-s1

Comment by Vratko Polak [ 10/Feb/17 ]

Same on Carbon [1].

[1] https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-3node-periodic-bulkomatic-clustering-daily-only-carbon/176/archives/log.html.gz#s1-s1

Comment by Luis Gomez [ 10/Feb/17 ]

The red tests, right? in particular the Check No Network and Verify Flows in Switch TCs.

Comment by Abhijit Kumbhare [ 23/Feb/17 ]

Muthu,

Can you check this?

Abhijit

Comment by Luis Gomez [ 23/Feb/17 ]

So I think this bug is related to the cluster issue exposed lately of instances not connecting to switch do not get proper notification for switch events. This is a miss in singleton API but I think there is some decision to workaround the problem in openflowplugin.

Comment by Abhijit Kumbhare [ 23/Mar/17 ]

Can you confirm if this still exists? The current jenkins seems to be passing:
https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-daily-only-boron/

Comment by Abhijit Kumbhare [ 23/Mar/17 ]

Assigning to Luis for confirmation.

Comment by Luis Gomez [ 23/Mar/17 ]

This bug is fixed through https://bugs.opendaylight.org/show_bug.cgi?id=7736

Comment by Luis Gomez [ 29/Mar/17 ]

Reopen for Carbon, we can still see alien switch in cluster when owner (only one connecting to switch) goes down:

https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-3node-periodic-bulkomatic-clustering-daily-only-carbon/219/archives/log.html.gz

Comment by Luis Gomez [ 14/Apr/17 ]

New example for Carbon where switch is still seen after owner goes down.

https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-3node-periodic-bulkomatic-clustering-daily-only-carbon/229/archives/log.html.gz

Comment by Jozef Bacigal [ 24/Apr/17 ]

Luis, to find some answers I have one quick questions.

What actually means kill controller? You close karaf instance or just disable the network connection ? Is there only one controller ?

Comment by Luis Gomez [ 24/Apr/17 ]

We kill tha karaf process.

Comment by Luis Gomez [ 25/Apr/17 ]

This is how to reproduce manually:

  • Connect switch to 1 node out of the 3 that conforms the cluster.
  • Kill the karaf process of that node.
  • Check switch operational info (inv & topo) is still available where there is no switch connected to cluster anymore.
  • Disconnect switch from killed node.
  • Start killed node.
  • Check switch operational info (inv & topo) is still available where there is no switch connected to cluster anymore.

BR/Luis

Comment by Jozef Bacigal [ 02/May/17 ]

https://git.opendaylight.org/gerrit/#/c/56230/

Comment by Luis Gomez [ 15/May/17 ]

Patch looks good.

Comment by Luis Gomez [ 15/May/17 ]

Rising to blocker as this works in Boron.

Comment by A H [ 15/May/17 ]

Patch https://git.opendaylight.org/gerrit/#/c/57090/ was successfully merged. Can we mark this bug as fixed and resolved?

Comment by Luis Gomez [ 16/May/17 ]

Yes we can close this.

Generated at Wed Feb 07 20:33:17 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.