[OPNFLWPLUG-868] br-sfc become slave and disconnect from ODL Created: 09/Mar/17  Updated: 27/Sep/21  Resolved: 03/Jul/17

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Mohamed ElSerngawy Assignee: Mohamed ElSerngawy
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Linux
Platform: PC


Attachments: HTML File Vagrantfile     Zip Archive logs.zip     Text File relatedlog.txt    
Issue Links:
Duplicate
is duplicated by OPNFLWPLUG-873 Was not able to set role SLAVE to dev... Resolved
External issue ID: 7948

 Description   

After installing the odl-openflowplugin-flow-services-ui feature in odl then trying to connect to a bridge "br-sfc" in ovs instance (ovs_version: "2.5.90"). ODL was not able to connect showing the following logs error at [1]. It was tested with Opendaylight Boron SR2 release and recent Carbon distribution [0]. In order to reproduce the issue you need at least 4 bridges try to connect to ODL.

[0] https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/distribution-karaf/0.6.0-SNAPSHOT/distribution-karaf-0.6.0-20170309.102115-4296.zip

[1] 2017-03-09 21:54:05,446 | INFO | pool-16-thread-1 | DeviceMastership | 236 - org.opendaylight.openflowplugin.applications.forwardingrules-manager - 0.4.0.SNAPSHOT | FRM started for: openflow:235989041245259
2017-03-09 21:54:05,538 | INFO | entLoopGroup-5-1 | RoleService | 228 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | submitRoleChange onSuccess for device:Uri [_value=openflow:235989041245259], role:BECOMEMASTER
2017-03-09 21:54:07,506 | WARN | pool-19-thread-1 | DeviceContextImpl | 228 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | New role BECOMESLAVE was not propagated to device openflow:200584538364992 during 10 sec
2017-03-09 21:54:07,515 | ERROR | pool-19-thread-1 | SalRoleServiceImpl | 228 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | SetRoleService set Role BECOMESLAVE for Node: KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:200584538364992]]]]}

fail . Reason java.util.concurrent.CancellationException: Task was cancelled.
2017-03-09 21:54:07,515 | WARN | pool-19-thread-1 | DeviceManagerImpl | 228 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | Was not able to set role SLAVE to device on node openflow:200584538364992
2017-03-09 21:54:10,542 | INFO | entLoopGroup-5-1 | ConnectionAdapterImpl | 217 - org.opendaylight.openflowjava.openflow-protocol-impl - 0.9.0.SNAPSHOT | Hello received
2017-03-09 21:54:10,550 | INFO | entLoopGroup-5-1 | DeviceManagerImpl | 228 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | ConnectionEvent: Device connected to controller, Device:/10.10.1.60:38467, NodeId:Uri [_value=openflow:200584538364992]
2017-03-09 21:54:10,554 | INFO | entLoopGroup-5-1 | SalRoleServiceImpl | 228 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | SetRole called with input:SetRoleInput [_controllerRole=BECOMESLAVE, _node=NodeRef [_value=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:200584538364992]]]]}

], augmentation=[]]
2017-03-09 21:54:10,559 | INFO | entLoopGroup-5-1 | SalRoleServiceImpl | 228 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | Requesting state change to BECOMESLAVE
2017-03-09 21:54:10,560 | INFO | entLoopGroup-5-1 | SalRoleServiceImpl | 228 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | RoleChangeTask called on device:openflow:200584538364992 OFPRole:BECOMESLAVE
2017-03-09 21:54:10,563 | INFO | entLoopGroup-5-1 | RoleService | 228 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | getGenerationIdFromDevice called for device: openflow:200584538364992
2017-03-09 21:54:20,573 | WARN | pool-19-thread-1 | DeviceContextImpl | 228 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | New role BECOMESLAVE was not propagated to device openflow:200584538364992 during 10 sec
2017-03-09 21:54:20,573 | ERROR | pool-19-thread-1 | SalRoleServiceImpl | 228 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | SetRoleService set Role BECOMESLAVE for Node: KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:200584538364992]]]]}

fail . Reason java.util.concurrent.CancellationException: Task was cancelled.
2017-03-09 21:54:20,573 | WARN | pool-19-thread-1 | DeviceManagerImpl | 228 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | Was not able to set role SLAVE to device on node openflow:200584538364992
2017-03-09 21:55:00,663 | INFO | pool-19-thread-1 | StatisticsManagerImpl | 228 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | Statistics collection for node openflow:42760610094664 still in progress even after 50 secs
2017-03-09 21:55:00,672 | WARN | pool-19-thread-1 | StatisticsManagerImpl | 228 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | Statistics gathering for single node openflow:42760610094664 was not successful: Task was cancelled.
2017-03-09 21:55:01,582 | INFO | pool-19-thread-1 | StatisticsManagerImpl | 228 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | Statistics collection for node openflow:235989041245259 still in progress even after 50 secs
2017-03-09 21:55:01,591 | WARN | pool-19-thread-1 | StatisticsManagerImpl | 228 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | Statistics gathering for single node openflow:235989041245259 was not successful: Task was cancelled.



 Comments   
Comment by Tomas Slusny [ 14/Mar/17 ]

I noticed that you are using OVS with some custom patches (https://github.com/yyang13/ovs_nsh_patches). Do you tried running it with clean OVS?

Comment by Mohamed ElSerngawy [ 14/Mar/17 ]

(In reply to Tomas Slusny from comment #1)
> I noticed that you are using OVS with some custom patches
> (https://github.com/yyang13/ovs_nsh_patches). Do you tried running it with
> clean OVS?

Hi,
I'm using ovs_nsh because I'm working with SFC. I have to use ovs_nsh. one thing could help after reproducing the issue if I stop/start ODL the br-sfc are connecting fine.

Comment by Tomas Slusny [ 15/Mar/17 ]

So I tried reproducing it following exactly your steps like 3 times, and it was working perfectly, I do not found any errors in log, and is_connected was true on controller.

Comment by Mohamed ElSerngawy [ 15/Mar/17 ]

Attachment Vagrantfile has been added with description: reproduce the env using this vagrant file

Comment by Mohamed ElSerngawy [ 15/Mar/17 ]

(In reply to Tomas Slusny from comment #3)
> So I tried reproducing it following exactly your steps like 3 times, and it
> was working perfectly, I do not found any errors in log, and is_connected
> was true on controller.

well , did you create the env using the vagrant I attached with the email thread ? I attached with the bug may you can try it.

Comment by Tomas Slusny [ 15/Mar/17 ]

Yes, I was reproducing it using that Vagrantfile that you attached in email (but I had to switch from amd64 to i386 Ubuntu, because I was running VM in VM so it was not possible to emulate amd64), but it should not cause any difference. For reference, here is my slightly modified Vagrantfile I was using to support my hardware: https://hastebin.com/iqekuzeruj

Comment by Mohamed ElSerngawy [ 15/Mar/17 ]

(In reply to Tomas Slusny from comment #6)
> Yes, I was reproducing it using that Vagrantfile that you attached in email
> (but I had to switch from amd64 to i386 Ubuntu, because I was running VM in
> VM so it was not possible to emulate amd64), but it should not cause any
> difference. For reference, here is my slightly modified Vagrantfile I was
> using to support my hardware: https://hastebin.com/iqekuzeruj

I don't really know wt to say, at least we are 3 persons tried it and we were able to reproduce it. I agree I don't think the i386 changes can have affect.

Comment by Jamo Luhrsen [ 21/Mar/17 ]

this has happened again in netvirt upstream CSIT:

robot report:
https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-mitaka-upstream-transparent-boron/503/archives/log.html.gz

karaf log:
https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-mitaka-upstream-transparent-boron/503/archives/odl1_karaf.log.gz

Comment by Jamo Luhrsen [ 27/Mar/17 ]

(In reply to Jamo Luhrsen from comment #8)
> this has happened again in netvirt upstream CSIT:
>
> robot report:
> https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-
> mitaka-upstream-transparent-boron/503/archives/log.html.gz
>
> karaf log:
> https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-
> mitaka-upstream-transparent-boron/503/archives/odl1_karaf.log.gz

again...

https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-newton-nodl-v2-upstream-stateful-carbon/260/archives/log.html.gz

Comment by Tomas Slusny [ 27/Mar/17 ]

It is possible to try this on Jozef's patches for split connection layer? https://bugs.opendaylight.org/show_bug.cgi?id=6802. As on that patches we are not sending SLAVE on start anyomore, so it is possible that this bug will be fixed with OPNFLWPLUG-773.

I will try to reproduce this bug again on my local environment and test it on Jozef's patches by myself if I will reproduce it, but last time I spent 2 days just trying to reproduce it without any success, so it would help if there are some simple reproduction steps that will guarantee that this bug will occur.

Comment by Abhijit Kumbhare [ 12/Apr/17 ]

Changed the priority to be critical

Comment by Jozef Bacigal [ 20/Apr/17 ]

I am now confused, it is something in OVS or in Controller or in Plugin, did we get to any conclusion ? Anyway Jamo or Mohamed guys plz can you try it on actual master or carbon branch with org.opendaylight.openflowplugin.impl DEBUG and send me the karaf logs ?

If this issue is still an issue

Comment by Guy Sela [ 20/Apr/17 ]

This is a reproduction on Boron SR3 with some extra downstream code.
Might help.

Comment by Guy Sela [ 20/Apr/17 ]

Attachment relatedlog.txt has been added with description: LOG with TRACE enabled

Comment by Guy Sela [ 20/Apr/17 ]

I think that the problem is that the thread is stuck here:
"epollEventLoopGroup-9-5" #352 prio=10 os_prio=0 tid=0x00007f1d5c035800 nid=0x4b2a waiting on condition [0x00007f1d3eb3a000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)

  • parking to wait for <0x000000009090c2a0> (a java.util.concurrent.Semaphore$FairSync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
    at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
    at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.registerService(ClusterSingletonServiceGroupImpl.java:165)
    at org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl.registerClusterSingletonService(AbstractClusterSingletonServiceProviderImpl.java:106)
    at Proxyc35cabb5_c759_4813_92fa_04e3d78ebdb4.registerClusterSingletonService(Unknown Source)
    at Proxydf546731_1b69_4412_ab0b_f8719b2ae290.registerClusterSingletonService(Unknown Source)
    at org.opendaylight.openflowplugin.impl.lifecycle.LifecycleServiceImpl.registerService(LifecycleServiceImpl.java:137)
    at org.opendaylight.openflowplugin.impl.device.DeviceManagerImpl.onDeviceContextLevelUp(DeviceManagerImpl.java:171)
    at org.opendaylight.openflowplugin.impl.rpc.RpcManagerImpl.onDeviceContextLevelUp(RpcManagerImpl.java:87)
    at org.opendaylight.openflowplugin.impl.statistics.StatisticsManagerImpl.onDeviceContextLevelUp(StatisticsManagerImpl.java:121)
    at org.opendaylight.openflowplugin.impl.device.DeviceManagerImpl.deviceConnected(DeviceManagerImpl.java:249)
    at org.opendaylight.openflowplugin.impl.connection.listener.HandshakeListenerImpl$1.onSuccess(HandshakeListenerImpl.java:75)
    at org.opendaylight.openflowplugin.impl.connection.listener.HandshakeListenerImpl$1.onSuccess(HandshakeListenerImpl.java:68)
    at com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)
    at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
    at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
    at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
    at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:185)
    at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:53)
    at org.opendaylight.openflowjava.protocol.impl.core.connection.AbstractRpcListener.successfulRpc(AbstractRpcListener.java:88)
    at org.opendaylight.openflowjava.protocol.impl.core.connection.ResponseExpectedRpcListener.completed(ResponseExpectedRpcListener.java:38)
    at org.opendaylight.openflowjava.protocol.impl.core.connection.ConnectionAdapterImpl.consumeDeviceMessage(ConnectionAdapterImpl.java:141)
    at org.opendaylight.openflowjava.protocol.impl.core.connection.AbstractConnectionAdapterStatistics.consume(AbstractConnectionAdapterStatistics.java:66)
    at org.opendaylight.openflowjava.protocol.impl.core.connection.ConnectionAdapterImpl.consume(ConnectionAdapterImpl.java:43)
    at org.opendaylight.openflowjava.protocol.impl.core.DelegatingInboundHandler.channelRead(DelegatingInboundHandler.java:46)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
Comment by Guy Sela [ 20/Apr/17 ]

After further examination, this problem seems to be caused by our downstream extra code. You should however try to run "pkill -QUIT java" and examine your the thread stack traces to see if you are not suffering from the same issue.

Comment by Mohamed ElSerngawy [ 20/Apr/17 ]

I just tried again with openflowplugin distro from master, get the same issue. It is so wired cause after restart odl it works fine. may something related to datastore (just thoughts)

Comment by Mohamed ElSerngawy [ 20/Apr/17 ]

Attachment logs.zip has been added with description: Log Trace enabled

Comment by Guy Sela [ 20/Apr/17 ]

(In reply to Mohamed El-Serngawy from comment #17)
> Created attachment 1739 [details]
> Log Trace enabled
>
> I just tried again with openflowplugin distro from master, get the same
> issue. It is so wired cause after restart odl it works fine. may something
> related to datastore (just thoughts)

Can you check if you have a "stuck" thread like the trace I pasted here?

Comment by Guy Sela [ 20/Apr/17 ]

If it is the same issue we are experiencing, I'm suspecting that there is a bug in AbstractClusterSingletonServiceProviderImpl.

I think this method should be synchronized, because 2 services with the same serviceIdentifier could call it concurrently:
public final ClusterSingletonServiceRegistration registerClusterSingletonService(
@CheckForNull final ClusterSingletonService service) {
LOG.debug("Call registrationService {} method for ClusterSingletonService Provider {}", service,
this.getClass().getName());

Preconditions.checkArgument(service != null);
Preconditions.checkArgument(!Strings.isNullOrEmpty(service.getIdentifier().getValue()),
"ClusterSingletonService idetnifier can not be null. {}", service);

final String serviceIdentifier = service.getIdentifier().getValue();
ClusterSingletonServiceGroup<P, E, C> serviceGroup = serviceGroupMap.get(serviceIdentifier);
if (serviceGroup == null)

{ final E mainEntity = createEntity(SERVICE_ENTITY_TYPE, serviceIdentifier); final E closeEntity = createEntity(CLOSE_SERVICE_ENTITY_TYPE, serviceIdentifier); serviceGroup = new ClusterSingletonServiceGroupImpl<>(serviceIdentifier, mainEntity, closeEntity, entityOwnershipService, serviceGroupMap); serviceGroupMap.put(service.getIdentifier().getValue(), serviceGroup); serviceGroup.initializationClusterSingletonGroup(); }

return serviceGroup.registerService(service);
}

Comment by Mohamed ElSerngawy [ 20/Apr/17 ]

I killed all running java process before start odl. BTW Not every-time restart ODL works I just tried it again and get the same error. For some reason after set controller for the third bridge "I'm using ovs" the previous 2 bridges disconnect and we go in this loop of slave role for all bridges

2017-04-20 15:47:00,056 | INFO | pool-16-thread-1 | DeviceMastership | 228 - org.opendaylight.openflowplugin.applications.forwardingrules-manager - 0.5.0.SNAPSHOT | Triggering reconciliation for device openflow:248700584868174
2017-04-20 15:47:00,068 | INFO | on-dispatcher-62 | DeviceMastership | 228 - org.opendaylight.openflowplugin.applications.forwardingrules-manager - 0.5.0.SNAPSHOT | Triggering reconciliation for device openflow:248700584868174
2017-04-20 15:47:02,391 | INFO | entLoopGroup-5-2 | ConnectionAdapterImpl | 209 - org.opendaylight.openflowjava.openflow-protocol-impl - 0.10.0.SNAPSHOT | Hello received
2017-04-20 15:47:02,398 | INFO | entLoopGroup-5-2 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Device openflow:279506666460743 connected.
2017-04-20 15:47:02,399 | INFO | entLoopGroup-5-2 | DeviceManagerImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | ConnectionEvent: Device connected to controller, Device:/10.10.1.60:47397, NodeId:Uri [_value=openflow:279506666460743]
2017-04-20 15:47:02,414 | INFO | entLoopGroup-5-2 | LifecycleServiceImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Starting clustering MASTER services for node openflow:279506666460743
2017-04-20 15:47:02,416 | INFO | entLoopGroup-5-2 | DeviceContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Starting device context cluster services for node openflow:279506666460743
2017-04-20 15:47:02,752 | INFO | pool-20-thread-1 | SalRoleServiceImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | SetRole called with input:SetRoleInput [_controllerRole=BECOMESLAVE, _node=NodeRef [_value=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:279506666460743]]]]}

], augmentation=[]]
2017-04-20 15:47:02,756 | INFO | pool-20-thread-1 | SalRoleServiceImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Requesting state change to BECOMESLAVE
2017-04-20 15:47:02,757 | INFO | pool-20-thread-1 | SalRoleServiceImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | RoleChangeTask called on device:openflow:279506666460743 OFPRole:BECOMESLAVE
2017-04-20 15:47:02,758 | INFO | pool-20-thread-1 | RoleService | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | getGenerationIdFromDevice called for device: openflow:279506666460743
2017-04-20 15:47:10,701 | INFO | entLoopGroup-5-1 | ConnectionAdapterImpl | 209 - org.opendaylight.openflowjava.openflow-protocol-impl - 0.10.0.SNAPSHOT | Hello received
2017-04-20 15:47:10,710 | INFO | entLoopGroup-5-1 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Device openflow:226628663130185 connected.
2017-04-20 15:47:10,715 | INFO | entLoopGroup-5-1 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | An auxiliary connection was added to device: openflow:226628663130185
2017-04-20 15:47:12,762 | WARN | pool-20-thread-1 | DeviceContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | New role BECOMESLAVE was not propagated to device openflow:279506666460743 during 10 sec
2017-04-20 15:47:12,763 | ERROR | pool-20-thread-1 | SalRoleServiceImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | SetRoleService set Role BECOMESLAVE for Node: KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:279506666460743]]]]}

fail . Reason java.util.concurrent.CancellationException: Task was cancelled.
2017-04-20 15:47:12,763 | WARN | pool-20-thread-1 | LifecycleServiceImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Was not able to set role SLAVE to device on node openflow:279506666460743
2017-04-20 15:47:15,974 | INFO | entLoopGroup-5-1 | ConnectionAdapterImpl | 209 - org.opendaylight.openflowjava.openflow-protocol-impl - 0.10.0.SNAPSHOT | Hello received
2017-04-20 15:47:15,980 | INFO | entLoopGroup-5-1 | ContextChainHolderImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Device openflow:279506666460743 connected.
2017-04-20 15:47:15,982 | INFO | entLoopGroup-5-1 | DeviceManagerImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | ConnectionEvent: Device connected to controller, Device:/10.10.1.60:47399, NodeId:Uri [_value=openflow:279506666460743]
2017-04-20 15:47:22,781 | INFO | pool-20-thread-1 | SalRoleServiceImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | SetRole called with input:SetRoleInput [_controllerRole=BECOMESLAVE, _node=NodeRef [_value=KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:279506666460743]]]]}

], augmentation=[]]
2017-04-20 15:47:22,782 | INFO | pool-20-thread-1 | SalRoleServiceImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Requesting state change to BECOMESLAVE
2017-04-20 15:47:22,782 | INFO | pool-20-thread-1 | SalRoleServiceImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | RoleChangeTask called on device:openflow:279506666460743 OFPRole:BECOMESLAVE
2017-04-20 15:47:22,782 | INFO | pool-20-thread-1 | RoleService | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | getGenerationIdFromDevice called for device: openflow:279506666460743
2017-04-20 15:47:32,792 | WARN | pool-20-thread-1 | DeviceContextImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | New role BECOMESLAVE was not propagated to device openflow:279506666460743 during 10 sec
2017-04-20 15:47:32,793 | ERROR | pool-20-thread-1 | SalRoleServiceImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | SetRoleService set Role BECOMESLAVE for Node: KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:279506666460743]]]]}

fail . Reason java.util.concurrent.CancellationException: Task was cancelled.
2017-04-20 15:47:32,793 | WARN | pool-20-thread-1 | LifecycleServiceImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Was not able to set role SLAVE to device on node openflow:279506666460743
2017-04-20 15:47:52,431 | INFO | pool-20-thread-1 | StatisticsManagerImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Statistics collection for node openflow:226628663130185 still in progress even after 50 secs
2017-04-20 15:47:52,438 | WARN | pool-20-thread-1 | StatisticsManagerImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Statistics gathering for single node openflow:226628663130185 was not successful: Task was cancelled.
2017-04-20 15:48:06,221 | INFO | pool-20-thread-1 | StatisticsManagerImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Statistics collection for node openflow:244975001850697 still in progress even after 50 secs
2017-04-20 15:48:06,228 | WARN | pool-20-thread-1 | StatisticsManagerImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Statistics gathering for single node openflow:244975001850697 was not successful: Task was cancelled.
2017-04-20 15:48:06,651 | INFO | pool-20-thread-1 | StatisticsManagerImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Statistics collection for node openflow:248700584868174 still in progress even after 50 secs
2017-04-20 15:48:06,659 | WARN | pool-20-thread-1 | StatisticsManagerImpl | 220 - org.opendaylight.openflowplugin.impl - 0.5.0.SNAPSHOT | Statistics gathering for single node openflow:248700584868174 was not successful: Task was cancelled.

Comment by Guy Sela [ 23/Apr/17 ]

I will add that all the threads that are stuck on the Semaphore.acquire() are waiting for this thread, which is stuck on Future.get:
"epollEventLoopGroup-9-4" #350 prio=10 os_prio=0 tid=0x00007fd17842a800 nid=0x4fcd waiting on condition [0x00007fd16e3c6000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)

  • parking to wait for <0x00000000f648f7b0> (a com.google.common.util.concurrent.AbstractFuture$Sync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
    at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
    at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
    at org.opendaylight.openflowplugin.impl.util.DeviceInitializationUtils.initializeNodeInformation(DeviceInitializationUtils.java:155)
    at org.opendaylight.openflowplugin.impl.device.DeviceContextImpl.onContextInstantiateService(DeviceContextImpl.java:728)
    at org.opendaylight.openflowplugin.impl.lifecycle.LifecycleServiceImpl.instantiateServiceInstance(LifecycleServiceImpl.java:54)
    at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.registerService(ClusterSingletonServiceGroupImpl.java:171)
    at org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl.registerClusterSingletonService(AbstractClusterSingletonServiceProviderImpl.java:106)
    at Proxy2d983b90_21cf_4858_bbdf_b72c07ad3ed0.registerClusterSingletonService(Unknown Source)
    at Proxy3074a2f9_7555_4633_8019_1f3e7a6a6d1e.registerClusterSingletonService(Unknown Source)
    at org.opendaylight.openflowplugin.impl.lifecycle.LifecycleServiceImpl.registerService(LifecycleServiceImpl.java:137)
    at org.opendaylight.openflowplugin.impl.device.DeviceManagerImpl.onDeviceContextLevelUp(DeviceManagerImpl.java:171)
    at org.opendaylight.openflowplugin.impl.rpc.RpcManagerImpl.onDeviceContextLevelUp(RpcManagerImpl.java:87)
    at org.opendaylight.openflowplugin.impl.statistics.StatisticsManagerImpl.onDeviceContextLevelUp(StatisticsManagerImpl.java:121)
    at org.opendaylight.openflowplugin.impl.device.DeviceManagerImpl.deviceConnected(DeviceManagerImpl.java:249)
    at org.opendaylight.openflowplugin.impl.connection.listener.HandshakeListenerImpl$1.onSuccess(HandshakeListenerImpl.java:75)
    at org.opendaylight.openflowplugin.impl.connection.listener.HandshakeListenerImpl$1.onSuccess(HandshakeListenerImpl.java:68)
    at com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)
Comment by Abhijit Kumbhare [ 27/Apr/17 ]

From the meeting: Jozef thinks it is a bug in MD-SAL. Jozef to follow up with MD-SAL folks.

Comment by Abhijit Kumbhare [ 04/May/17 ]

Still under investigation (Miroslav and Tomas).

Comment by Abhijit Kumbhare [ 04/May/17 ]

Adding Anil - as he may be able to advise.

Comment by Jozef Bacigal [ 05/May/17 ]

Guys, I cant find anything in logs you attached. Plz can you repeat this issue on master branch.

And plz set

org.opendaylight.openflowplugin.impl DEBUG

and set bigger log file size.

Thank you guys.

Comment by Tomas Slusny [ 15/May/17 ]

Also, can you try this with this patch: https://git.opendaylight.org/gerrit/#/c/56838/ ? Reason why device is refusing to become SLAVE is unknown but in this case, the future that was mentioned earlier was waiting indefinitely for switch response that never came, and that patch I created adds timeout for this case.

Comment by Jozef Bacigal [ 03/Jul/17 ]

It is about a month with no response. Assume issue is fixed. Closing this bug. Feel free to reopen it if issue persists.

Generated at Wed Feb 07 20:33:36 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.