[OPNFLWPLUG-1035] Not able to set MASTER role on device openflow:206931627343278, reason: java.lang.RuntimeException: Failed to initialize device openflow:206931627343278 in 9s Created: 17/Sep/18 Updated: 22/Nov/19 Resolved: 07/Jan/19 |
|
| Status: | Resolved |
| Project: | OpenFlowPlugin |
| Component/s: | openflowplugin-impl |
| Affects Version/s: | Neon |
| Fix Version/s: | Fluorine-SR2, Neon |
| Type: | Bug | Priority: | Medium |
| Reporter: | Sam Hague | Assignee: | Somashekhar Javalagi |
| Resolution: | Done | Votes: | 0 |
| Labels: | csit:3node, csit:exception, csit:failures | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Epic Link: | Clustering Stability | ||||||||||||||||||||
| Description |
|
During 3node testing using graceful start and stop the following exception is seen. Graceful start and stop means using the bin/stop and start commands to stop and start ODL rather than using kill -9. This means there is an orderly stop to the bundles where each bundle is destroyed. Some bundles in ovsdb are destroyed and the exceptions start. The flow is all three ODLs are up. Then shutdown ODL1 via bin/stop. Start ODL1, Stop ODL2, start ODL2. Cluster is good then take down ODL3. ODL happens to be the leader. These exception below repeats until ODL3 is restarted. 2018-09-17T17:04:37,848 | WARN | ofppool-6 | ContextChainHolderImpl | 375 - org.opendaylight.openflowplugin.impl - 0.8.0.SNAPSHOT | Not able to set MASTER role on device openflow:206931627343278, reason: java.lang.RuntimeException: Failed to initialize device openflow:206931627343278 in 9s: java.util.concurrent.TimeoutException: Waited 9000 milliseconds for com.google.common.util.concurrent.AbstractTransformFuture$AsyncTransformFuture@2c0a4d2e[status=PENDING, info=[inputFuture=[com.google.common.util.concurrent.AbstractTransformFuture$TransformFuture@53ef55d8[status=PENDING, info=[inputFuture=[com.google.common.util.concurrent.SettableFuture@7e4ef304[status=PENDING]], function=[org.opendaylight.openflowplugin.impl.device.initialization.OF13DeviceInitializer$$Lambda$1883/564334312@79b2c2e7]]]], function=[org.opendaylight.openflowplugin.impl.device.initialization.OF13DeviceInitializer$$Lambda$1885/836204643@7af83c2]]] |
| Comments |
| Comment by Somashekhar Javalagi [ 31/Oct/18 ] |
|
Not able to get any clue from the logs. shague, can you reproduce the issue by enabling below logs in karaf log:set DEBUG org.opendaylight.openflowplugin.impl.device.initialization.OF13DeviceInitializer log:set DEBUG org.opendaylight.openflowplugin.impl.device.initialization.AbstractDeviceInitializer |
| Comment by Luis Gomez [ 14/Nov/18 ] |
|
What about these WARNs in the log? they seem to happen before master role fail: 2018-09-17T17:43:33,582 | WARN | pool-77-thread-1 | OutboundQueueEntry | 384 - org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl - 0.8.0.SNAPSHOT | Can't commit a completed message. 2018-09-17T17:43:33,582 | WARN | pool-77-thread-1 | HashedWheelTimer | 55 - io.netty.common - 4.1.22.Final | An exception was thrown by TimerTask. java.lang.NullPointerException: null at org.opendaylight.openflowplugin.impl.services.AbstractRequestCallback.onFailure(AbstractRequestCallback.java:72) ~[?:?] at org.opendaylight.openflowjava.protocol.impl.core.connection.OutboundQueueEntry.commit(OutboundQueueEntry.java:57) ~[384:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.8.0.SNAPSHOT] at org.opendaylight.openflowjava.protocol.impl.core.connection.StackedOutboundQueue.commitEntry(StackedOutboundQueue.java:36) ~[384:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.8.0.SNAPSHOT] at org.opendaylight.openflowjava.protocol.impl.core.connection.AbstractStackedOutboundQueue.commitEntry(AbstractStackedOutboundQueue.java:60) ~[384:org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl:0.8.0.SNAPSHOT] at org.opendaylight.openflowplugin.impl.connection.OutboundQueueProviderImpl.commitEntry(OutboundQueueProviderImpl.java:81) ~[?:?] at org.opendaylight.openflowplugin.impl.services.AbstractService.handleServiceCall(AbstractService.java:165) ~[?:?] at org.opendaylight.openflowplugin.impl.services.AbstractService.handleServiceCall(AbstractService.java:110) ~[?:?] at org.opendaylight.openflowplugin.impl.services.RoleService.getGenerationIdFromDevice(RoleService.java:69) ~[?:?] at org.opendaylight.openflowplugin.impl.services.sal.SalRoleServiceImpl.tryToChangeRole(SalRoleServiceImpl.java:84) ~[?:?] at org.opendaylight.openflowplugin.impl.services.sal.SalRoleServiceImpl.setRole(SalRoleServiceImpl.java:78) ~[?:?] at org.opendaylight.openflowplugin.impl.role.RoleContextImpl.sendRoleChangeToDevice(RoleContextImpl.java:165) ~[?:?] at org.opendaylight.openflowplugin.impl.role.RoleContextImpl.makeDeviceSlave(RoleContextImpl.java:144) ~[?:?] at org.opendaylight.openflowplugin.impl.role.RoleContextImpl.lambda$new$0(RoleContextImpl.java:70) ~[?:?] at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:663) [55:io.netty.common:4.1.22.Final] at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:738) [55:io.netty.common:4.1.22.Final] at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:466) [55:io.netty.common:4.1.22.Final] at java.lang.Thread.run(Thread.java:748) [?:?] |
| Comment by Anil Vishnoi [ 07/Jan/19 ] |
|
Fixed through |
| Comment by Thomas Ferrandiz [ 24/Jun/19 ] |
|
Hello,
I have had the same issue with the Neon release (see file attached. The setup is one ODL instance and one OVS instance. Steps to reproduce:
The last steps fails and there's an infinite loop of reconnection and disconnection in the logs.
|
| Comment by Tejas Nevrekar [ 22/Nov/19 ] |
|
tferrandiz i saw a similar symptom. It went away after installing odl-openflowplugin-app-arbitratorreconciliation. Could you please check if that helps in your case? |