[CONTROLLER-1299] Clustering : CDS fails to restart when configuration changes Created: 08/May/15 Updated: 26/May/15 Resolved: 26/May/15 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | mdsal |
| Affects Version/s: | Post-Helium |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Moiz Raja | Assignee: | Moiz Raja |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Issue Links: |
|
||||||||
| External issue ID: | 3156 | ||||||||
| Priority: | Highest | ||||||||
| Description |
|
CDS get's restarted when it's configuration is changed using NetConf or when a NetConf mount is added. In this scenario CDS does not restart properly because the akka actor system that is used by CDS gets terminated on stopping any one of the datastores and does not get recreated when the datastore is recreated. |
| Comments |
| Comment by Tom Pantelis [ 08/May/15 ] |
|
I created https://bugs.opendaylight.org/show_bug.cgi?id=2976 for this same issue. This issue also occurs if you restart the controller. There's actually 2 issues here. The controller restart case is indirectly caused by the feature config pusher and persistent config pusher both trying to start the modules. Having both pushers doing the same thing on startup could lead to system instability. The underlying cause is that CDS uses a static ActorSystem and thus it tries to reuse it when restarted in the same JVM instance. I think we need to just null out static field when closed. |
| Comment by Moiz Raja [ 08/May/15 ] |
| Comment by Moiz Raja [ 08/May/15 ] |
|
I've tried putting in a more comprehensive fix here. The reproduction steps I had used are, 1. Bring up controller In the logs you will see an exception that you are trying to create an actor on an actor system that is terminating/terminated. In the MBeans you will see that one of the Distributed*DataStores will not be created. The system may continue to work because the IMDS may be used instead of CDS but in that case persistence will be broken. |
| Comment by Tom Pantelis [ 09/May/15 ] |
|
Yes - those are the issues I saw. (In reply to Moiz Raja from comment #3) |