[CONTROLLER-1629] AbstractConcurrentDataBrokerTest causes deadlocks which AbstractDataBrokerTest did not Created: 05/Apr/17 Updated: 25/Jul/23 Resolved: 13/Jul/17 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | mdsal |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Michael Vorburger | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| External issue ID: | 8163 | ||||||||
| Description |
|
The AbstractConcurrentDataBrokerTest, introduced in For example, https://git.opendaylight.org/gerrit/#/c/54353/ replaces AbstractDataBrokerTest in Daexim by AbstractConcurrentDataBrokerTest, and causes DataExportImportAppProviderTest to deadlock... Tali Ben-Meir seems to have hit a similar issue in a not-yet-published new test she's writing for project Federation, according to private email exchange with a thread dump that looks similar. |
| Comments |
| Comment by Michael Vorburger [ 05/Apr/17 ] |
|
Attachment bug8163-jstack-1.txt has been added with description: jstack of deadlock in DataExportImportAppProviderTest when changing from @Deprecated AbstractDataBrokerTest to AbstractConcurrentDataBrokerTest |
| Comment by Tom Pantelis [ 13/Apr/17 ] |
|
The problem is the use of MoreExecutors$DirectExecutorService in the QueuedNotificationManager and the fact that the DataExportImportAppProvider does a blocking get on the submit Future in onDataTreeChanged. The solution is to not use a direct executor for publishing change notifications. |
| Comment by Michael Vorburger [ 14/Apr/17 ] |
|
> DataExportImportAppProvider does a blocking get on the submit Future in onDataTreeChanged I may be too tired to see it - but where? DataExportImportAppProvider has a registerDataTreeChangeListener() in init() with an (anonymous / lambda) onDataTreeChanged which goes into ipcHandler() - but I don't see a get() in there? There definitely are many in other places though (c/55022), so I'm probably just not seeing what you're pointing to... > The problem is the use of MoreExecutors$DirectExecutorService in the QueuedNotificationManager (...) The solution is to not use a direct executor for publishing change notifications. In Daexim or in controller? Oh wait, I'm seeing that this DirectExecutorService is created in AbstractDataBrokerTestCustomizer .. this sounds strangely familiar.. PS, just FTR: This WORKED before switching from the AbstractDataBrokerTest to the AbstractConcurrentDataBrokerTest, and more importantly it works in "production" (with the "real" not test DataBroker) .. so it's a problem in controller utilities? |
| Comment by Tom Pantelis [ 18/Apr/17 ] |
|
(In reply to Michael Vorburger from comment #3) One code path is ipcHandler -> updateNodeStatus. The code in this class really should be changed to be async and not block on Futures. > Is this the discussion we've already had in Yes. > PS, just FTR: This WORKED before switching from the AbstractDataBrokerTest It works in production b/c there are no direct executors used in production core code. CDS uses akka actors and is all async. In this case the InMemoryDOMDataStore executor in the tests cannot not be a direct executor since the DataExportImportAppProvider code does blocking gets in its DTCL callback. |
| Comment by Claudio David Gasparini [ 06/May/17 ] |
|
Hi, investigating BUG-8335 It seems that Im hitting the same issue. dead lock is caused by AbstractConcurrentDataBrokerTest. Created a demo class and test to easily reproduce the issue. moving the bug as confirmed, you can changed it you don't agree |
| Comment by Tom Pantelis [ 04/Jul/17 ] |
|
Is this fixed now? |
| Comment by Michael Vorburger [ 05/Jul/17 ] |
|
> Is this fixed now? Nope, this one slipped through the cracks on my side; never got to this, due to other (internal, downstream) prios .. if you/someone would like to have a go at it, please don't let me stop you.. |
| Comment by Tom Pantelis [ 05/Jul/17 ] |
|
(In reply to Michael Vorburger from comment #7) I commented earlier on the solution based on the original thread dump - do not use a direct executor for publishing change notifications. I can push a patch for that but I suspect some downstream tests may fail intermittently if they're not expecting DTCL notification to be async. |
| Comment by Tom Pantelis [ 06/Jul/17 ] |
|
Submitted https://git.opendaylight.org/gerrit/#/c/60022/ to use a thread pool executor instead of direct executor for DTCL notifications. This fixes the deadlock with the DemoBugTest in https://git.opendaylight.org/gerrit/#/c/56640/. This change may break existing tests if they expect DTCL notifications to be immediate but we can tackle those if/when they occur. If it becomes too problematic we can keep AbstractConcurrentDataBrokerTest using the direct executor and create another version that uses a thread pool although I'd rather not do that. |
| Comment by Michael Vorburger [ 13/Jul/17 ] |
|
> create another version that uses a thread pool although I'd rather not do that FTR: https://git.opendaylight.org/gerrit/#/c/60022/ ultimately ended up having to do this in order not to break backward compatibility, and introduces an option to choose whether or not you want the DataTreeChangeListenerExecutor to be multi threaded or not. |
| Comment by Michael Vorburger [ 13/Jul/17 ] |