[NETCONF-284] Deadlock between filterNotification of NetconfDevice and onSessionDown of NetconfDeviceCommunicator Created: 24/Sep/16 Updated: 15/Mar/19 Resolved: 30/Sep/16 |
|
| Status: | Resolved |
| Project: | netconf |
| Component/s: | netconf |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Kangqian Yin | Assignee: | Kangqian Yin |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: Linux |
||
| External issue ID: | 6797 |
| Description |
|
Recently we met a deadlock between filterNotification of NetconfDevice and onSessionDown of NetconfDeviceCommunicator. The filterNotification of NetconfDevice was executed in a thread in the threadpool of remote-connector-processing-executor to filter a NetconfCapabilityChange notification which would cause disconnecting the netconf connector to odl internal netconf server. The onSessionDown of NetconfDeviceCommunicator was executed in a netty thread when the netty channel of netconf client received the close operation issued by the disconnecting. The onSessionDown would first hold the sessionLock and then would call NetconfDevice's onRemoteSessionDown which would call NotificationHandler's onRemoteSchemaDown which is a synchoronized method. But the monitor of NotificationHandler was held by the filterNotification of NetconfDevice from the beginning of its execution. Its following excecution was to call NetconfDeviceCommunicator's onSessionTerminated which would also try to hold sessionLock. Then deadlocked! The deadlock information from jstack is as following: Found one Java-level deadlock: Java stack information for the threads listed above:
|
| Comments |
| Comment by Jakub Morvay [ 27/Sep/16 ] |
|
Hi Kanggian, Thank you for your report. How many times have you ran into this deadlock? |
| Comment by Kangqian Yin [ 27/Sep/16 ] |
|
100% deadlocking in installing one of our own features. I've found the deadlocking condition and the way to break the conditon. I'll push a patch to fix it later. |
| Comment by Jakub Morvay [ 27/Sep/16 ] |
|
I can confirm this deadlock. The thing is that onCapabilityChange notification should be cached and after successful connection is processed by thread of remote-connector-processing-executor. Not cached notifications are processed in netty thread so they shouldn't deadlock. I simulated this behavior by always caching onCapabilityChange notification, but I wasn't able to reproduce this deadlock on every reconnect. So If I am missing something, please let me know what other condition has to be met. Also patches are always welcomed so feel free to push your fix for this deadlock. Since you are working on this, you can also take this bug here in bugzilla. |
| Comment by Kangqian Yin [ 28/Sep/16 ] |
|
I met this deadlock in Beryllium-SR3 distribution. In this version, YangStoreService's notifiyListeners has the function to issue NetconfCapabilityChange notification to registered remote netconf clients. Just as you say, if the NetconfCapabilityChange notification is executed in netty thread, deadlock won't happen. In all the deadlocks I met, the NetconfCapabilityChange notification is executed in a thread of remote-connector-processing-executor in the calling tree of NetconfDevice$RecursiveSchemaSetup.setUpSchema which is submitted by NetconfDevice.onRemoteSessionUp, i.e, triggered when the netconf session is reconnected. The filter of NetconfCapabilityChange will cause disconnect this netconf session again. The deadlock condition requires there must be a netty thread executing the close of this netconf session at the same time. If so, the two threads will race for NetconfDeviceCommunicator's tearDown operation. If the netty thread enters tearDown first, deadlock will happen for such tearDown cannot take the monitor of NotificationHandler. The monitor of Notification is held by the remote-connector-processing-executor thread at the beginning of filtering of NetconfCapabilityChange. Nevertheless, reproducing this deadlock is hard. There're several features I've tested, but only one feature to install will cause this deadlock. However, the racing of NetconfDeviceCommunicator's tearDown actually exists and can be drawn from just the code of NetconfDeviceCommunicator. Both onSessionDown and onSessionTerminated will call tearDown, and they can be executed in two independent threads seperately. |
| Comment by Kangqian Yin [ 29/Sep/16 ] |