[CONTROLLER-976] Clustering: Leaderless default shard during feature installation. Created: 30/Oct/14  Updated: 19/Oct/17  Resolved: 12/Nov/14

Status: Resolved
Project: controller
Component/s: mdsal
Affects Version/s: Helium
Fix Version/s: None

Type: Bug
Reporter: Vratko Polak Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: File cluster_20141030.log.xz    
Issue Links:
Blocks
is blocked by CONTROLLER-890 Clustering: Handle shard initializati... Resolved
External issue ID: 2284

 Description   

This is specialization of https://bugs.opendaylight.org/show_bug.cgi?id=1821 to context similar to https://bugs.opendaylight.org/show_bug.cgi?id=2283

Yes, applications should not write to a shard lacking a Leader, but installation of a karaf feature causes that, and Leaderless state is probable just at the time when clustering feature is being installed.

Basically, perhaps only a configurable transaction timeout is needed to make sure shard finds its Leader quicker.

It is also possible that the more responsible component should be config subsystem, detecting this state and re-trying for a (configurable) time.

Current workaround is to always verify clustering feature is fully ready before installing more features.

A log attached, showing what happens when instance (of 3 node cluster) starts isolated and features are slowly being installed.



 Comments   
Comment by Vratko Polak [ 30/Oct/14 ]

Attachment cluster_20141030.log.xz has been added with description: XZipped complete karaf.log

Comment by Robert Varga [ 31/Oct/14 ]

It would not be config subsystem, but the cluster database's Module – createInstance() is required to return a 'working' instance.

Since leader re-election can happen at any moment, I think the broker/transaction chain should block applications while the cluster is being formed.

Comment by Tom Pantelis [ 06/Nov/14 ]

https://git.opendaylight.org/gerrit/#/c/12215/ addresses this issue, i.e. on transaction create, the CDS waits (actually retires) a reasonable amount of time (30 sec) for a shard leader to become elected.

That patch was merged on Oct 29th and since this bug was reported Oct 30th, I'm assuming you didn't have that patch.

Comment by Moiz Raja [ 11/Nov/14 ]

Is this ok to close?

Comment by Vratko Polak [ 12/Nov/14 ]

> Is this ok to close?

Yes, I thing FIXED is the correct status right now (as in not CONFIRMED anymore but not VERIFIED yet).

Generated at Wed Feb 07 19:54:22 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.