[OPNFLWPLUG-599] He: FlowCapableInventoryProvider thread dies Created: 20/Jan/16  Updated: 27/Sep/21  Resolved: 23/Mar/16

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Guy Sela Assignee: Bhalaji Narayanan
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Linux
Platform: All


Attachments: Java Source File FlowCapableInventoryProvider.java    
External issue ID: 5020

 Description   

The thread that polls from the queue (Named “FlowCapableInventoryProvider”) died at our system, because of an exception in one of the handlers.
The run() loop should be protected from exceptions.

The Exception that made the thread die:
Exception in thread "FlowCapableInventoryProvider" java.lang.IllegalArgumentException: Instance identifier references (urn:opendaylight:inventory?revision=2013-08-19)node-connector[

{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:4:3}

] but data identifier is ImmutableMapEntryNode{nodeIdentifier=(urn:opendaylight:inventory?revision=2013-08-19)node-connector[

{(urn:opendaylight:inventory?revision=2013-08-19)id=3}

], value=[ImmutableLeafNode{nodeIdentifier=(urn:opendaylight:inventory?revision=2013-08-19)id, value=3, attributes={}}], attributes={}}

This caused deadlocks in the system, because inventory change operations are now stuck.

This is the thread that polls from the Notifications Queue, it is stuck because the FlowCapableInventoryProvider is dead:
"pool-31-thread-1" prio=10 tid=0x00007fd9d4f02000 nid=0x17e7 waiting on condition [0x00007fda80f70000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)

  • parking to wait for <0x000000072b8d69c8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(Unknown Source)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source)
    at java.util.concurrent.LinkedBlockingDeque.putLast(Unknown Source)
    at java.util.concurrent.LinkedBlockingDeque.put(Unknown Source)
    at org.opendaylight.openflowplugin.applications.inventory.manager.FlowCapableInventoryProvider.enqueue(FlowCapableInventoryProvider.java:60)
    at org.opendaylight.openflowplugin.applications.inventory.manager.NodeChangeCommiter.onNodeRemoved(NodeChangeCommiter.java:125)
  • locked <0x000000072b8d6a40> (a org.opendaylight.openflowplugin.applications.inventory.manager.NodeChangeCommiter)
    at org.opendaylight.yangtools.yang.binding.util.NotificationListenerInvoker.invokeNotification(NotificationListenerInvoker.java:91)
    at org.opendaylight.controller.md.sal.binding.impl.BindingDOMNotificationListenerAdapter.onNotification(BindingDOMNotificationListenerAdapter.java:44)
    at org.opendaylight.controller.md.sal.dom.broker.impl.DOMNotificationRouterEvent.deliverNotification(DOMNotificationRouterEvent.java:50)
    at org.opendaylight.controller.md.sal.dom.broker.impl.DOMNotificationRouter$1.onEvent(DOMNotificationRouter.java:68)
    at org.opendaylight.controller.md.sal.dom.broker.impl.DOMNotificationRouter$1.onEvent(DOMNotificationRouter.java:65)
    at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

So the Notifications Queue is full, and then we can’t put new Notifications:
"nioEventLoopGroup-11-83" prio=10 tid=0x00007fd698061800 nid=0xb7bf runnable [0x00007fd5a6997000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
at com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
at com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
at com.lmax.disruptor.RingBuffer.next(RingBuffer.java:246)
at org.opendaylight.controller.md.sal.dom.broker.impl.DOMNotificationRouter.putNotification(DOMNotificationRouter.java:200)
at org.opendaylight.controller.md.sal.binding.impl.BindingDOMNotificationPublishServiceAdapter.putNotification(BindingDOMNotificationPublishServiceAdapter.java:51)
at org.opendaylight.controller.md.sal.binding.compat.HeliumNotificationProviderServiceAdapter.publish(HeliumNotificationProviderServiceAdapter.java:33)
at org.opendaylight.openflowplugin.openflow.md.core.sal.OFRpcTaskUtil$1FutureCallbackImpl.onSuccess(OFRpcTaskUtil.java:115)
at org.opendaylight.openflowplugin.openflow.md.core.sal.OFRpcTaskUtil$1FutureCallbackImpl.onSuccess(OFRpcTaskUtil.java:101)
….



 Comments   
Comment by Robert Varga [ 24/Jan/16 ]

The first exceptions points to a coding bug, incorrect data is being stored – the identifier where we store the data does not match the data item itself. I think that makes it a release blocker.

Comment by Abhijit Kumbhare [ 25/Jan/16 ]

Guy,

Is this reproducible or did it happen once? Are there steps to reproduce the bug?

Abhijit

Comment by Matanel Deutsch [ 25/Jan/16 ]

Hi
in order to reproduce this do the following:
use a new yang augmentation on the node-connector-updated for example:

augment "/inv:node-connector-updated"

{ ext:augment-identifier "new-node-connector-updated"; description "port status updated notification."; uses some-grouping; }

generate java class from this YANG and use it in a notification service(NotificationProviderService) with publish method.

Comment by Trung Quang Nguyen [ 10/Feb/16 ]

This is a new update for Updated FlowCapableInventoryProvider class. Two changes on is run() and one in submitOperations()

Comment by Trung Quang Nguyen [ 10/Feb/16 ]

Attachment FlowCapableInventoryProvider.java has been added with description: Updated FlowCapableInventoryProvider class

Comment by Trung Quang Nguyen [ 10/Feb/16 ]

This FlowCapableInventoryProvider thread actually poses two bugs. The first bug is about memory leak. If you run SDN application to install some rules or flows on a Openflow-supporting switch, through the time it will consume heap memory up. The reason for the bug is that in the infinite loop for(; an arraylist is allocated overheatedly. In order to prevent from allocating so many times, the decalaration of the arraylist should be put out of the infinate loop.

The second bug is about the mismatch of instance indentifiers. In the submitOperations function, and in the catch block a retry of summiting transaction is conducted but without any try-catch block to prevent the exception happen again because we still have mismatch error. In order to fix that, we need to add another try-catch block again to handle the exception in order to prevent the FlowCapableInventoryProvider thread from crashing.

Comment by Shuva Jyoti Kar [ 11/Feb/16 ]

https://git.opendaylight.org/gerrit/34462

Comment by Anil Vishnoi [ 11/Feb/16 ]

shuva, looks like this patch is abandoned, is there any other patch that you pushed?

Comment by Abhijit Kumbhare [ 11/Feb/16 ]

Patch in stable/beryllium:

https://git.opendaylight.org/gerrit/#/c/34515/

Comment by Abhijit Kumbhare [ 11/Feb/16 ]

Patch in master:

https://git.opendaylight.org/gerrit/#/c/34505/4

Comment by Trung Quang Nguyen [ 12/Feb/16 ]

So the memory leak bug is neglected in your patch or is it out of scope of the bug? The memory leak bug is even more crucial because in my case when I've fixed it I haven't seen FlowCapaleInventoryProvider thread dies again

Generated at Wed Feb 07 20:32:54 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.