[OPNFLWPLUG-1035] Not able to set MASTER role on device openflow:206931627343278, reason: java.lang.RuntimeException: Failed to initialize device openflow:206931627343278 in 9s Created: 17/Sep/18  Updated: 22/Nov/19  Resolved: 07/Jan/19

Status: Resolved
Project: OpenFlowPlugin
Component/s: openflowplugin-impl
Affects Version/s: Neon
Fix Version/s: Fluorine-SR2, Neon

Type: Bug Priority: Medium
Reporter: Sam Hague Assignee: Somashekhar Javalagi
Resolution: Done Votes: 0
Labels: csit:3node, csit:exception, csit:failures
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File ODL connection issue_neon.log     File odl1_karaf.log.tar.xz    
Issue Links:
Duplicate
duplicates OPNFLWPLUG-1039 Not able to set SLAVE role on device ... Resolved
Relates
relates to OPNFLWPLUG-1033 org.opendaylight.openflowplugin.appli... Resolved
relates to OPNFLWPLUG-1034 NullPointerException: null at org.ope... Resolved
Epic Link: Clustering Stability

 Description   

During 3node testing using graceful start and stop the following exception is seen. Graceful start and stop means using the bin/stop and start commands to stop and start ODL rather than using kill -9. This means there is an orderly stop to the bundles where each bundle is destroyed. Some bundles in ovsdb are destroyed and the exceptions start.

The flow is all three ODLs are up. Then shutdown ODL1 via bin/stop. Start ODL1, Stop ODL2, start ODL2. Cluster is good then take down ODL3. ODL happens to be the leader. These exception below repeats until ODL3 is restarted.

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/builder-copy-sandbox-logs/408/shague-haproxy-netvirt-csit-3node-0cmb-1ctl-2cmp-openstack-queens-upstream-stateful-neon/8/odl_1/odl1_karaf.log.gz

2018-09-17T17:04:37,848 | WARN  | ofppool-6        | ContextChainHolderImpl           | 375 - org.opendaylight.openflowplugin.impl - 0.8.0.SNAPSHOT | Not able to set MASTER role on device openflow:206931627343278, reason: java.lang.RuntimeException: Failed to initialize device openflow:206931627343278 in 9s: java.util.concurrent.TimeoutException: Waited 9000 milliseconds for com.google.common.util.concurrent.AbstractTransformFuture$AsyncTransformFuture@2c0a4d2e[status=PENDING, info=[inputFuture=[com.google.common.util.concurrent.AbstractTransformFuture$TransformFuture@53ef55d8[status=PENDING, info=[inputFuture=[com.google.common.util.concurrent.SettableFuture@7e4ef304[status=PENDING]], function=[org.opendaylight.openflowplugin.impl.device.initialization.OF13DeviceInitializer$$Lambda$1883/564334312@79b2c2e7]]]], function=[org.opendaylight.openflowplugin.impl.device.initialization.OF13DeviceInitializer$$Lambda$1885/836204643@7af83c2]]]


 Comments   
Comment by Somashekhar Javalagi [ 31/Oct/18 ]

Not able to get any clue from the logs.

shague, can you reproduce the issue by enabling below logs in karaf

log:set DEBUG org.opendaylight.openflowplugin.impl.device.initialization.OF13DeviceInitializer

log:set DEBUG org.opendaylight.openflowplugin.impl.device.initialization.AbstractDeviceInitializer

Comment by Luis Gomez [ 14/Nov/18 ]

What about these WARNs in the log? they seem to happen before master role fail:

2018-09-17T17:43:33,582 | WARN  | pool-77-thread-1 | OutboundQueueEntry               | 384 - org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl - 0.8.0.SNAPSHOT | Can't commit a completed message.
2018-09-17T17:43:33,582 | WARN  | pool-77-thread-1 | HashedWheelTimer                 | 55 - io.netty.common - 4.1.22.Final | An exception was thrown by TimerTask.
java.lang.NullPointerException: null
	at org.opendaylight.openflowplugin.impl.services.AbstractRequestCallback.onFailure(AbstractRequestCallback.java:72) ~[?:?]
	at org.opendaylight.openflowjava.protocol.impl.core.connection.OutboundQueueEntry.commit(OutboundQueueEntry.java:57) ~[384:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.8.0.SNAPSHOT]
	at org.opendaylight.openflowjava.protocol.impl.core.connection.StackedOutboundQueue.commitEntry(StackedOutboundQueue.java:36) ~[384:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.8.0.SNAPSHOT]
	at org.opendaylight.openflowjava.protocol.impl.core.connection.AbstractStackedOutboundQueue.commitEntry(AbstractStackedOutboundQueue.java:60) ~[384:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.8.0.SNAPSHOT]
	at org.opendaylight.openflowplugin.impl.connection.OutboundQueueProviderImpl.commitEntry(OutboundQueueProviderImpl.java:81) ~[?:?]
	at org.opendaylight.openflowplugin.impl.services.AbstractService.handleServiceCall(AbstractService.java:165) ~[?:?]
	at org.opendaylight.openflowplugin.impl.services.AbstractService.handleServiceCall(AbstractService.java:110) ~[?:?]
	at org.opendaylight.openflowplugin.impl.services.RoleService.getGenerationIdFromDevice(RoleService.java:69) ~[?:?]
	at org.opendaylight.openflowplugin.impl.services.sal.SalRoleServiceImpl.tryToChangeRole(SalRoleServiceImpl.java:84) ~[?:?]
	at org.opendaylight.openflowplugin.impl.services.sal.SalRoleServiceImpl.setRole(SalRoleServiceImpl.java:78) ~[?:?]
	at org.opendaylight.openflowplugin.impl.role.RoleContextImpl.sendRoleChangeToDevice(RoleContextImpl.java:165) ~[?:?]
	at org.opendaylight.openflowplugin.impl.role.RoleContextImpl.makeDeviceSlave(RoleContextImpl.java:144) ~[?:?]
	at org.opendaylight.openflowplugin.impl.role.RoleContextImpl.lambda$new$0(RoleContextImpl.java:70) ~[?:?]
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:663) [55:io.netty.common:4.1.22.Final]
	at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:738) [55:io.netty.common:4.1.22.Final]
	at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:466) [55:io.netty.common:4.1.22.Final]
	at java.lang.Thread.run(Thread.java:748) [?:?]

Comment by Anil Vishnoi [ 07/Jan/19 ]

Fixed through OPNFLWPLUG-1039.

Comment by Thomas Ferrandiz [ 24/Jun/19 ]

Hello,

 

I have had the same issue with the Neon release (see file attached.

The setup is one ODL instance and one OVS instance.

Steps to reproduce:

  • connect the OVS to ODL (with ovs-vsctl set-controller mybridge tcp:127.0.0.1)
  • disconnect OVS from ODL (ovs-vsctl del-controller mybridge)
  • reconnect OVS to ODL

The last steps fails and there's an infinite loop of reconnection and disconnection in the logs.

 

ODL connection issue_neon.log

 

 

 

Comment by Tejas Nevrekar [ 22/Nov/19 ]

tferrandiz i saw a similar symptom. It went away after installing odl-openflowplugin-app-arbitratorreconciliation. Could you please check if that helps in your case?

Generated at Wed Feb 07 20:34:01 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.