[CONTROLLER-976] Clustering: Leaderless default shard during feature installation. Created: 30/Oct/14 Updated: 19/Oct/17 Resolved: 12/Nov/14 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | mdsal |
| Affects Version/s: | Helium |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Vratko Polak | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| External issue ID: | 2284 | ||||||||
| Description |
|
This is specialization of https://bugs.opendaylight.org/show_bug.cgi?id=1821 to context similar to https://bugs.opendaylight.org/show_bug.cgi?id=2283 Yes, applications should not write to a shard lacking a Leader, but installation of a karaf feature causes that, and Leaderless state is probable just at the time when clustering feature is being installed. Basically, perhaps only a configurable transaction timeout is needed to make sure shard finds its Leader quicker. It is also possible that the more responsible component should be config subsystem, detecting this state and re-trying for a (configurable) time. Current workaround is to always verify clustering feature is fully ready before installing more features. A log attached, showing what happens when instance (of 3 node cluster) starts isolated and features are slowly being installed. |
| Comments |
| Comment by Vratko Polak [ 30/Oct/14 ] |
|
Attachment cluster_20141030.log.xz has been added with description: XZipped complete karaf.log |
| Comment by Robert Varga [ 31/Oct/14 ] |
|
It would not be config subsystem, but the cluster database's Module – createInstance() is required to return a 'working' instance. Since leader re-election can happen at any moment, I think the broker/transaction chain should block applications while the cluster is being formed. |
| Comment by Tom Pantelis [ 06/Nov/14 ] |
|
https://git.opendaylight.org/gerrit/#/c/12215/ addresses this issue, i.e. on transaction create, the CDS waits (actually retires) a reasonable amount of time (30 sec) for a shard leader to become elected. That patch was merged on Oct 29th and since this bug was reported Oct 30th, I'm assuming you didn't have that patch. |
| Comment by Moiz Raja [ 11/Nov/14 ] |
|
Is this ok to close? |
| Comment by Vratko Polak [ 12/Nov/14 ] |
|
> Is this ok to close? Yes, I thing FIXED is the correct status right now (as in not CONFIRMED anymore but not VERIFIED yet). |