[OPNFLWPLUG-1085] switch-idle-timeout does not work Created: 25/Feb/20  Updated: 06/Nov/20  Resolved: 05/Mar/20

Status: Resolved
Project: OpenFlowPlugin
Component/s: None
Affects Version/s: None
Fix Version/s: Magnesium

Type: Bug Priority: Highest
Reporter: Luis Gomez Assignee: dheenadayalan b
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

switch-idle-timeout used to work to detect when switch enters in idle state, for example when OF channel communication gets blocked by a firewall.

When blocking communication from switch to controller:

sudo iptables -A OUTPUT -p tcp -d {controllerIP} --dport 6633 -j DROP

The controller should set the OF connection to idle after default switch-idle-timeout=15 sec:


2020-02-25T02:52:58,126 | INFO  | ofppool-0        | SystemNotificationsListenerImpl  | 397 - org.opendaylight.openflowplugin.impl - 0.6.3 | Switch Idle state occurred, node=/10.1.64.222:47564|auxId=0

And after that the OF connection goes down (see here oxygen karaf.log):

2020-02-25T02:53:00,126 | WARN  | ofppool-0        | SystemNotificationsListenerImpl  | 397 - org.opendaylight.openflowplugin.impl - 0.6.3 | Exception while  waiting for echoReply from [/10.1.64.222:47564] in TIMEOUTING state: Waited 2000 milliseconds for com.google.common.util.concurrent.SettableFuture@33929ad[status=PENDING]
2020-02-25T02:53:00,127 | INFO  | ofppool-0        | SystemNotificationsListenerImpl  | 397 - org.opendaylight.openflowplugin.impl - 0.6.3 | ConnectionEvent:Closing connection as device is idle. Echo sent at Tue Feb 25 02:52:58 UTC 2020. Device:/10.1.64.222:47564, NodeId:openflow:18096668435427573921

The above does not happen in Neon, Sodium or Magnesium.



 Comments   
Comment by Tejas Nevrekar [ 25/Feb/20 ]

This maybe related to https://jira.opendaylight.org/browse/OPNFLWPLUG-1058 aka https://git.opendaylight.org/gerrit/gitweb?p=openflowplugin.git;a=commit;h=f985dee509f3a142552076da87dad3f0176d5b20 

Shweta_Chaturvedi could you please comment?

Comment by Luis Gomez [ 25/Feb/20 ]

This would explain the lack of log, but still I do not see the switch connection going down.

Comment by Arunprakash D [ 26/Feb/20 ]

Dheena has figured it out that the below review seems to causing this issue.

https://git.opendaylight.org/gerrit/c/openflowplugin/+/78199

 

Switch idle state getting triggered if we revert the above review. He is working on a common fix and will update here based on the findings.

Comment by dheenadayalan b [ 27/Feb/20 ]

ecelgp Fix has been provided in the below change review

https://git.opendaylight.org/gerrit/c/openflowplugin/+/88090

Moved Idle handler just above the  Delegating inbound within  channel pipe-line.

The reason is that Switch-idle event message fired by Idle handler were never consumed by delegating inbound , when it was added at the end of pipeline, by then delegating  inbound would have already consumed channel in-active message.

We were unable to replicate this issue OPNFLWPLUG-1039 mentioned during cluster setup -  after the above fix, and ensured  cs-it test (test-openflowplugin-core) successful.

Thanks,

Dheena

Comment by Luis Gomez [ 27/Feb/20 ]

Thanks dheenab. Arunprakash since the fix was added so quickly and it is a regression, I think we should include it in Magnesium. Please cherry-pick.

Comment by Jamo Luhrsen [ 05/Mar/20 ]

ecelgp stable/magnesium cherry pick is merged. can we close this blocker bug?

Generated at Wed Feb 07 20:34:09 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.