[CONTROLLER-1676] After leader is isolated, new leader is elected too slowly Created: 15/May/17  Updated: 25/Jul/23  Resolved: 16/May/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Vratko Polak Assignee: Tomas Cere
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 8447

 Description   

This is affecting CSIT results at the node which ends up being a follower in the majority partition, so this might be related to AAA-130.

This prevents testing a scenario when isolation is short enough so that client does not time out.

In the recent test, it took 44 seconds to confirm both members agree on the new leader, and ODL took [0] almost 5 second to respond to the final request.

Karaf.log [1] shows several suspicious lines between 06:23:51,346 and 06:24:49,542, but it is not clear to me what causes the process to take this long.

[0] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/684/archives/log.html.gz#s1-s33-t1-k2-k11-k2-k1-k1-k6-k2-k1-k5-k1-k3-k1
[1] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/684/archives/odl3_karaf.log.gz



 Comments   
Comment by Vratko Polak [ 15/May/17 ]

This also affects [2] prefix-based shards.

[2] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/684/archives/log.html.gz#s1-s35-t2-k2-k11-k1-k1-k1-k6-k2-k1-k2-k1-k1-k3-k3-k1

Comment by Tomas Cere [ 16/May/17 ]

Upon investigation it seems the leader was elected in ~5 seconds, however jolokia on member-3 seemed to take around 45 seconds of retries to produce a response in the 5 seconds thats expected in the robot suites. Will add rpc's into cluster-admin for getting the shard role to get around jolokia.

Comment by Tomas Cere [ 16/May/17 ]

These introduce the get-shard-role rpcs into cluster admin, which should bypass the slowness of jolokia:

https://git.opendaylight.org/gerrit/#/c/57131/
https://git.opendaylight.org/gerrit/#/c/57133/

Generated at Wed Feb 07 19:56:10 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.