[OPNFLWPLUG-1007] Future (eventually) failed: addFlow -and- transmitPacket Created: 23/Apr/18  Updated: 17/Sep/18  Resolved: 13/Aug/18

Status: Resolved
Project: OpenFlowPlugin
Component/s: clustering
Affects Version/s: None
Fix Version/s: Oxygen-SR4, Fluorine

Type: New Feature Priority: High
Reporter: Jamo Luhrsen Assignee: Gobinath Suganthan
Resolution: Done Votes: 0
Labels: CSIT, csit:3node, csit:exception, csit:sporadic
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates CONTROLLER-1788 Failed to execute RPC (urn:opendaylig... Resolved
Relates
relates to OPNFLWPLUG-1024 Break out the odl-openflowplugin-app-... Resolved
relates to NETVIRT-1031 Review controller-max-backoff and con... Resolved
Epic Link: Clustering Stability

 Description   

It's not clear (yet) what kind of problem (if any) this Exception poses, but the netvirt
clustering CSIT jobs are unstable. The following Exception stood out as something
scary. Potentially, missing flows could be one reason some of the openstack
instances do not have proper connectivity and test cases fail.

 

2018-04-20T06:25:17,932 | ERROR | opendaylight-cluster-data-notification-dispatcher-94 | DefaultConfigPusher              | 374 - org.opendaylight.openflowplugin.applications.of-switch-config-pusher - 0.6.1 | Future (eventually) failed: addFlow
org.opendaylight.controller.md.sal.dom.api.DOMRpcImplementationNotAvailableException: No implementation of RPC AbsoluteSchemaPath{path=[(urn:opendaylight:module:config?revision=2014-10-15)set-config]} available
        at org.opendaylight.controller.md.sal.dom.broker.impl.RoutedDOMRpcRoutingTableEntry.invokeRpc(RoutedDOMRpcRoutingTableEntry.java:85) [218:org.opendaylight.controller.sal-broker-impl:1.7.1]
        at org.opendaylight.controller.md.sal.dom.broker.impl.DOMRpcRoutingTable.invokeRpc(DOMRpcRoutingTable.java:178) [218:org.opendaylight.controller.sal-broker-impl:1.7.1]
        at org.opendaylight.controller.md.sal.dom.broker.impl.DOMRpcRouter.invokeRpc(DOMRpcRouter.java:102) [218:org.opendaylight.controller.sal-broker-impl:1.7.1]
        at Proxyaec100e4_e760_48b8_b4f7_2f391b651b90.invokeRpc(Unknown Source) [?:?]
        at Proxyc11c9ab5_3bc0_4a35_92fb_52a7d17ea7ca.invokeRpc(Unknown Source) [?:?]
        at org.opendaylight.controller.md.sal.binding.impl.RpcServiceAdapter.invoke0(RpcServiceAdapter.java:68) [215:org.opendaylight.controller.sal-binding-broker-impl:1.7.1]
        at org.opendaylight.controller.md.sal.binding.impl.RpcServiceAdapter.access$000(RpcServiceAdapter.java:46) [215:org.opendaylight.controller.sal-binding-broker-impl:1.7.1]
        at org.opendaylight.controller.md.sal.binding.impl.RpcServiceAdapter$RpcInvocationStrategy.invoke(RpcServiceAdapter.java:165) [215:org.opendaylight.controller.sal-binding-broker-impl:1.7.1]
        at org.opendaylight.controller.md.sal.binding.impl.RpcServiceAdapter.invoke(RpcServiceAdapter.java:99) [215:org.opendaylight.controller.sal-binding-broker-impl:1.7.1]
        at com.sun.proxy.$Proxy97.setConfig(Unknown Source) [386:org.opendaylight.openflowplugin.model.flow-service:0.6.1]
        at org.opendaylight.openflowplugin.openflow.ofswitch.config.DefaultConfigPusher.onDataTreeChanged(DefaultConfigPusher.java:84) [374:org.opendaylight.openflowplugin.applications.of-switch-config-pusher:0.6.1]
        at org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataTreeChangeListenerAdapter.onDataTreeChanged(BindingDOMDataTreeChangeListenerAdapter.java:41) [215:org.opendaylight.controller.sal-binding-broker-impl:1.7.1]
        at org.opendaylight.controller.cluster.datastore.DataTreeChangeListenerActor.dataChanged(DataTreeChangeListenerActor.java:67) [228:org.opendaylight.controller.sal-distributed-datastore:1.7.1]
        at org.opendaylight.controller.cluster.datastore.DataTreeChangeListenerActor.handleReceive(DataTreeChangeListenerActor.java:41) [228:org.opendaylight.controller.sal-distributed-datastore:1.7.1]
        at org.opendaylight.controller.cluster.common.actor.AbstractUntypedActor.onReceive(AbstractUntypedActor.java:38) [221:org.opendaylight.controller.sal-clustering-commons:1.7.1]
        at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167) [42:com.typesafe.akka.actor:2.5.11]
        at akka.actor.Actor.aroundReceive(Actor.scala:517) [42:com.typesafe.akka.actor:2.5.11]
        at akka.actor.Actor.aroundReceive$(Actor.scala:515) [42:com.typesafe.akka.actor:2.5.11]
        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97) [42:com.typesafe.akka.actor:2.5.11]
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:590) [42:com.typesafe.akka.actor:2.5.11]
        at akka.actor.ActorCell.invoke(ActorCell.scala:559) [42:com.typesafe.akka.actor:2.5.11]
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) [42:com.typesafe.akka.actor:2.5.11]
        at akka.dispatch.Mailbox.run(Mailbox.scala:224) [42:com.typesafe.akka.actor:2.5.11]
        at akka.dispatch.Mailbox.exec(Mailbox.scala:234) [42:com.typesafe.akka.actor:2.5.11]
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [42:com.typesafe.akka.actor:2.5.11]
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [42:com.typesafe.akka.actor:2.5.11]
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [42:com.typesafe.akka.actor:2.5.11]
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [42:com.typesafe.akka.actor:2.5.11]
2018-04-20T06:25:17,933 | INFO  | ForkJoinPool-1-worker-7 | FlowBasedEgressServicesStateBindHelper | 252 - org.opendaylight.genius.interfacemanager-impl - 0.4.1 | bindServicesOnInterfaceType Egress - WIP
2018-04-20T06:25:17,934 | ERROR | opendaylight-cluster-data-notification-dispatcher-103 | LLDPSpeaker                      | 373 - org.opendaylight.openflowplugin.applications.lldp-speaker - 0.6.1 | Future (eventually) failed: transmitPacket
org.opendaylight.controller.md.sal.dom.api.DOMRpcImplementationNotAvailableException: Action Local{type=AbsoluteSchemaPath{path=[(urn:opendaylight:packet:service?revision=2013-07-09)transmit-packet]}, contextReference=/(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:275486970454348}]} has no instance matching org.opendaylight.controller.md.sal.binding.impl.LazySerializedContainerNode$WithContextRef@4a0d2b2
        at org.opendaylight.controller.blueprint.ext.ActionProviderBean.lambda$registerFallback$0(ActionProviderBean.java:133) ~[195:org.opendaylight.controller.blueprint:0.8.1]
        at org.opendaylight.controller.md.sal.dom.broker.impl.RoutedDOMRpcRoutingTableEntry.invokeRpc(RoutedDOMRpcRoutingTableEntry.java:71) [218:org.opendaylight.controller.sal-broker-impl:1.7.1]
        at org.opendaylight.controller.md.sal.dom.broker.impl.DOMRpcRoutingTable.invokeRpc(DOMRpcRoutingTable.java:178) [218:org.opendaylight.controller.sal-broker-impl:1.7.1]
        at org.opendaylight.controller.md.sal.dom.broker.impl.DOMRpcRouter.invokeRpc(DOMRpcRouter.java:102) [218:org.opendaylight.controller.sal-broker-impl:1.7.1]
        at Proxyaec100e4_e760_48b8_b4f7_2f391b651b90.invokeRpc(Unknown Source) [?:?]
        at Proxyc11c9ab5_3bc0_4a35_92fb_52a7d17ea7ca.invokeRpc(Unknown Source) [?:?]
        at org.opendaylight.controller.md.sal.binding.impl.RpcServiceAdapter.invoke0(RpcServiceAdapter.java:68) [215:org.opendaylight.controller.sal-binding-broker-impl:1.7.1]
        at org.opendaylight.controller.md.sal.binding.impl.RpcServiceAdapter.access$000(RpcServiceAdapter.java:46) [215:org.opendaylight.controller.sal-binding-broker-impl:1.7.1]
        at org.opendaylight.controller.md.sal.binding.impl.RpcServiceAdapter$RpcInvocationStrategy.invoke(RpcServiceAdapter.java:165) [215:org.opendaylight.controller.sal-binding-broker-impl:1.7.1]
        at org.opendaylight.controller.md.sal.binding.impl.RpcServiceAdapter.invoke(RpcServiceAdapter.java:99) [215:org.opendaylight.controller.sal-binding-broker-impl:1.7.1]
        at com.sun.proxy.$Proxy133.transmitPacket(Unknown Source) [386:org.opendaylight.openflowplugin.model.flow-service:0.6.1]
        at org.opendaylight.openflowplugin.applications.lldpspeaker.LLDPSpeaker.nodeConnectorAdded(LLDPSpeaker.java:190) [373:org.opendaylight.openflowplugin.applications.lldp-speaker:0.6.1]
        at org.opendaylight.openflowplugin.applications.lldpspeaker.NodeConnectorInventoryEventTranslator.notifyNodeConnectorAppeared(NodeConnectorInventoryEventTranslator.java:179) [373:org.opendaylight.openflowplugin.applications.lldp-speaker:0.6.1]
        at org.opendaylight.openflowplugin.applications.lldpspeaker.NodeConnectorInventoryEventTranslator.processAddedConnector(NodeConnectorInventoryEventTranslator.java:124) [373:org.opendaylight.openflowplugin.applications.lldp-speaker:0.6.1]
        at org.opendaylight.openflowplugin.applications.lldpspeaker.NodeConnectorInventoryEventTranslator.onDataTreeChanged(NodeConnectorInventoryEventTranslator.java:102) [373:org.opendaylight.openflowplugin.applications.lldp-speaker:0.6.1]
        at org.opendaylight.controller.md.sal.binding.impl.BindingDOMDataTreeChangeListenerAdapter.onDataTreeChanged(BindingDOMDataTreeChangeListenerAdapter.java:41) [215:org.opendaylight.controller.sal-binding-broker-impl:1.7.1]
        at org.opendaylight.controller.cluster.datastore.DataTreeChangeListenerActor.dataChanged(DataTreeChangeListenerActor.java:67) [228:org.opendaylight.controller.sal-distributed-datastore:1.7.1]
        at org.opendaylight.controller.cluster.datastore.DataTreeChangeListenerActor.handleReceive(DataTreeChangeListenerActor.java:41) [228:org.opendaylight.controller.sal-distributed-datastore:1.7.1]
        at org.opendaylight.controller.cluster.common.actor.AbstractUntypedActor.onReceive(AbstractUntypedActor.java:38) [221:org.opendaylight.controller.sal-clustering-commons:1.7.1]
        at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167) [42:com.typesafe.akka.actor:2.5.11]
        at akka.actor.Actor.aroundReceive(Actor.scala:517) [42:com.typesafe.akka.actor:2.5.11]
        at akka.actor.Actor.aroundReceive$(Actor.scala:515) [42:com.typesafe.akka.actor:2.5.11]
        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97) [42:com.typesafe.akka.actor:2.5.11]
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:590) [42:com.typesafe.akka.actor:2.5.11]
        at akka.actor.ActorCell.invoke(ActorCell.scala:559) [42:com.typesafe.akka.actor:2.5.11]
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) [42:com.typesafe.akka.actor:2.5.11]
        at akka.dispatch.Mailbox.run(Mailbox.scala:224) [42:com.typesafe.akka.actor:2.5.11]
        at akka.dispatch.Mailbox.exec(Mailbox.scala:234) [42:com.typesafe.akka.actor:2.5.11]
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [42:com.typesafe.akka.actor:2.5.11]
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [42:com.typesafe.akka.actor:2.5.11]
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [42:com.typesafe.akka.actor:2.5.11]
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [42:com.typesafe.akka.actor:2.5.11]

The above trace was taken from this karaf log from this job. The other two ODL karaf logs are here and here.



 Comments   
Comment by Anil Vishnoi [ 03/May/18 ]

jluhrsen Looks like netvirt is using odl-openflowplugin-app-config-pusher feature and sends the openflow message to the switch to configure the switch to send the whole packet to the controller. Given that this feature is not cluster aware, it means instance of this feature will be running on all the controller as an active instance and all the instances will be pushing the change down to the switch. This feature uses Routed RPC call to send this request down to the switch. At the plugin level,controller that is master of the device will register for the Routed RPC. Once the Routed RPC are registered they will be advertise to the other controllers in the cluster. This advertisement can be delayed, so other controller instance might not see it instantly. 

 

Now the exception that you pasted can happen because of the following three reason:

(1) Data Change notification was received by the controller instance, but the Routed RPC was not yet pushing to this controller instance and when it fires the RPC, it says there is no Routed RPC implementation found.

(2) The moment it fires the Routed RPC call, the device disconnects from the controller.

(3) Cluster was in split brain situation and didn't converged yet.

In this scenario it looks like device was just connected, but the routed rpc was not registered by that time (because of scenario 1 and scenario 3).

 

2018-04-20T06:23:01,090 | INFO  | epollEventLoopGroup-9-2 | DeviceManagerImpl                | 383 - org.opendaylight.openflowplugin.impl - 0.6.1 | ConnectionEvent: Device connected to controller, Device:/10.30.170.142:44160, NodeId:Uri [_value=openflow:139226422832200]
2018-04-20T06:23:01,092 | INFO  | epollEventLoopGroup-9-2 | RoleContextImpl                  | 383 - org.opendaylight.openflowplugin.impl - 0.6.1 | Started timer for setting SLAVE role on device openflow:139226422832200 if no role will be set in 20s.
2018-04-20T06:23:01,100 | INFO  | org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.network.topology.topology.Node_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | ElanBridgeManager                | 345 - org.opendaylight.netvirt.elanmanager-impl - 0.6.1 | addPatchPort: node: ovsdb://uuid/6b7672d2-948c-4e66-be61-9591790d1117/bridge/br-physnet1, bridge: br-physnet1, portname: br-physnet1-int-patch peer: br-physnet1-pa status: success
2018-04-20T06:23:01,106 | INFO  | opendaylight-cluster-data-akka.actor.default-dispatcher-80 | DeviceOwnershipStatusService     | 373 - org.opendaylight.openflowplugin.applications.lldp-speaker - 0.6.1 | Entity ownership change received for node : openflow:139226422832200 : EntityOwnershipChange [entity=Entity [type=org.opendaylight.mdsal.ServiceEntityType, id=KeyedInstanceIdentifier{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.mdsal.core.general.entity.rev150930.Entity, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.mdsal.core.general.entity.rev150930.Entity[key=EntityKey [_name=openflow:139226422832200]]]}], state=REMOTE_OWNERSHIP_CHANGED [wasOwner=false, isOwner=false, hasOwner=true], inJeopardy=false]
2018-04-20T06:23:01,132 | INFO  | ResourceBatchingManager-0 | ResourceBatchingManager          | 258 - org.opendaylight.genius.mdsalutil-api - 0.4.1 | Total taken ##time = 153ms for resourceList of size 1 for resourceType ITM-DEFAULT-CONFIG
2018-04-20T06:23:01,269 | INFO  | pool-236-thread-1 | SnatNodeEventListener            | 354 - org.opendaylight.netvirt.natservice-impl - 0.6.1 | Dpn added 139226422832200

 

 

The second exception where it failed to send the LLDP packet also failed for the similar reason. 

Although this module is not cluster aware, but that actually should help in this case. Given that all the three instances of this module will try to send this message down to switch, the master instance where RPCs are registered should be successful. 

So if you look at the following log, the device connected at the time 2018-04-20T06:23:01 to all the controller and you see these exception only on controller 1 and controller 3, but not in controller2, because controller 2 was the master of the device, so the RPC call went fine.

2018-04-20T06:23:01,090 | INFO | epollEventLoopGroup-9-2 | DeviceManagerImpl | 383 - org.opendaylight.o2018-04-20T06:23:01,090penflowplugin.impl - 0.6.1 | ConnectionEvent: Device connected to controller, Device:/10.30.170.142:44160, NodeId:Uri [_value=openflow:139226422832200]

As per my understanding of the logs, this exception should not cause any functionality failure for netvirt, because controller 2 was able to push the call to the device.

But irrespective of that, i think this module need to be cluster aware, so that it always uses the local RPC and not routed RPC and that will avoid these unwarranted exceptions.

Comment by Jamo Luhrsen [ 04/May/18 ]

Avishnoi thanks for the detailed explanation. It explains a lot of what I was noticing, like how these things are not
coming in any consistent manner. It's all related to timing in several areas.

It would be good to make this module cluster aware as we are hopefully starting to get some momentum with
cluster stability (crossing fingers). Even if functionality is not adversely affected, the logs will be scary to an
end user that doesn't have the master level understand that guys like you have.

It sounds like it's probably a lower priority thing to work on though. Do you know what kind of task it would be?
small, medium, large?

Again, thanks for the awesome answer.

Comment by Jamo Luhrsen [ 24/May/18 ]

btw, this is happening a lot in this log:

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/openflowplugin-csit-3node-clustering-only-fluorine/83/odl3_karaf.log.gz

 

it's definitely something scary to see.

Comment by Anil Vishnoi [ 03/Jun/18 ]

jluhrsen I think effort is not really huge to make it cluster aware. I will see if someone is willing to pick this up in next openflow plugin meeting.

Comment by Sam Hague [ 09/Jul/18 ]

gobinath Arunprakash could you update where this jira stands? This issue seems to be one of the biggest issues in the netvirt csit right now. It causes many exceptions so it would be good to clean it up.

Comment by Faseela K [ 10/Jul/18 ]

gobinath Arunprakash : As far as I understand, netvirt does not have a usecase where LLDP topology discovery is needed. Can we make it a separate feature, so that it won't come by default in the main feature used by applications?

Also please confirm if the exceptions for add-flow is seen only for LLDP related flows, so that we can ignore these errors for now.

Comment by Anil Vishnoi [ 10/Jul/18 ]

k.faseela There are two issues here 

(1) of-switch-config-pusher application that sends the switch-config message to the device when it connects through RPC. And looks like it's failing because Routed RPC is not yet published.

(2) LLDP packet is sending the lldp packet but transmit-packet rpc is not available (probably because of the device disconnection).

 

For (1) i believe it's not causing any functional issue as such, because all the controller are pushing the switch-config message to the switch. But given that all the controller are pushing the message, it's unnecessary. So to avoid we need to make the module cluster-aware, and i believe gobinath is working on it.

To Note, making any module cluster aware does not guarantee that these exception won't be seen. You can still see the issue if you are trying to use the routed RPC that is not yes published to all the nodes, or the device disconnects just before your message is in-flight. 

Regarding (2) : LLDP speaker is already cluster aware, so the only reason i have seen LLDP speaker throws DOMRpcImplementationNotAvailableException is because your RPC call is in-flight and device gets disconnected. I am not sure is there any other way to control the rpc execution and rpc registration closure, given that it's being mediated by md-sal/cds. There are two possible solution here 

(1) As you mentioned NetVirt don't use lldp speaker based discovery, so you can just load topology-service and skip lldp-speaker and topology-lldp-discovery. Currently all of these applications are bundle in single feature (odl-openflowplugin-app-topology), so i pushed the patch to separate these features so user can load the feature they want. Please refer to the OPNFLWPLUG-1024 for more details and following gerrit 

https://git.opendaylight.org/gerrit/#/c/73870/

 

(2) In clustered environment, whenever the split brain happen, it causes churn to the controller or their is possibility that owner controller won't be responsive to the echo request from switch within 2 seconds and that can lead to the switch disconnection from switch side. I would suggest that you set the inacivity_probe time in OVS to 30 seconds or something, that will help avoid device disconnection because of no response to the echo request in 2 seconds. Here is the script for your reference 

//
x=`sudo ovs-vsctl --columns=_uuid list Controller | awk '{print $NF}'` 
echo $x 
for i in $x 
do 
sudo ovs-vsctl set Controller $i inactivity_probe=$1 
done

In our experience increasing the inacvitity_probe time help us in improving the stability of the cluster environment and also in the scale environment (we have set inactivity_probe time in upstream openflowplugin CSIT test).

Comment by Gobinath Suganthan [ 10/Jul/18 ]

Avishnoi I've started working on solving the exceptions in the LLDPSpeaker by sending LLDP packets(transmit-packet) only from the owner node by checking the EOS. There is "DeviceOwnershipStatusService" already in the LLDPSpeaker which can be used to check this. I've triggered CSIT on top of the patch to check if this solves the exceptions seen in lldp-speaker. We could use this also to prevent the exceptions occurring in other modules (table-miss-enforcer and of-switch-config-pusher) after making them cluster-aware.

Comment by Faseela K [ 10/Jul/18 ]

Avishnoi : Thanks, will take a look at https://git.opendaylight.org/gerrit/#/c/73870/ . This will be the first step for us.

Comment by Faseela K [ 10/Jul/18 ]

jluhrsen : Do we set inactivity probe in OVS, as Avishnoi has mentioned? If not, shall we add the configuration?

Comment by Sam Hague [ 10/Jul/18 ]

We do not currently set it in netvirt CSIT. The value can be set using the elan-config.xml as shown below. Above Avishnoi showed how the ofp csit sets the value directly on the switches using ovs-vsctl. I can add patch to do the below for netvirt clustering csit.

<elanmanager-config xmlns="urn:opendaylight:netvirt:elan:config">
  <auto-create-bridge>true</auto-create-bridge>
  <int-bridge-gen-mac>true</int-bridge-gen-mac>
  <temp-smac-learn-timeout>10</temp-smac-learn-timeout>
  <arp-punt-timeout>5</arp-punt-timeout>
  <punt-lldp-to-controller>false</punt-lldp-to-controller>
  <!--
  <controller-max-backoff>5000</controller-max-backoff>
  <controller-inactivity-probe>5000</controller-inactivity-probe>
  -->
  <auto-config-transport-zones>false</auto-config-transport-zones>
  <use-of-tunnels>false</use-of-tunnels>
  <openstack-vni-semantics-enforced>true</openstack-vni-semantics-enforced>
  <l2gw-stale-vlan-cleanup-delay-secs>900</l2gw-stale-vlan-cleanup-delay-secs>
  <l2gw-logical-switch-delay-secs>20</l2gw-logical-switch-delay-secs>
</elanmanager-config>

Comment by Sam Hague [ 10/Jul/18 ]

Avishnoi in the ofp csit, the inactivity_probe is set to 0 and the comment is to disable the echos. Should we disable it also or use the 30s?

    Comment    Below line disables switch echos
    Write    sh x=`sudo ovs-vsctl --columns=_uuid list Controller | awk '{print $NF}'`; for i in $x; do sudo ovs-vsctl set Controller $i inactivity_probe=0; done

Comment by Anil Vishnoi [ 10/Jul/18 ]

shague Either should work. If you disable inactivity probe from switch side, controller will send the echo request to check the connection aliveness, but that's more safe because that goes according the controller state at that point of time. But for production environment it's not recommended to disable the inactivity_probe, because there is a possibility that connection from switch to controller can go in bad state (like you drop packet going from switch to controller, but other way around it's working fine).

Comment by Anil Vishnoi [ 10/Jul/18 ]

gobinath I think your patch looks good, but i think we should not send the packet in nodeConnectorAdded() method, because run() method is already doing that for you. So can you update your patch to get rid of the trasmitPacket() part of the code?

Comment by Gobinath Suganthan [ 12/Jul/18 ]

AvishnoiWe don't use the lldp module and I'm not aware of the design. But my understanding is that the periodic thread will run after initialization but only the node connectors available when the node initially connected would be sending LLDP packets. In case of a newly added node connectors, we have to update the monitoring to include the new node connector too. In the existing implementation they explicitly want to send the lldp packets from the newly added node connectors before the scheduled thread pitches in. If there is no need for immediately detecting the new nodeconnector in the topology, we could do away with sending packets in nodeConnectorAdded()  method.

Generated at Wed Feb 07 20:33:57 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.