[CONTROLLER-1676] After leader is isolated, new leader is elected too slowly Created: 15/May/17 Updated: 25/Jul/23 Resolved: 16/May/17 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Vratko Polak | Assignee: | Tomas Cere |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| External issue ID: | 8447 |
| Description |
|
This is affecting CSIT results at the node which ends up being a follower in the majority partition, so this might be related to This prevents testing a scenario when isolation is short enough so that client does not time out. In the recent test, it took 44 seconds to confirm both members agree on the new leader, and ODL took [0] almost 5 second to respond to the final request. Karaf.log [1] shows several suspicious lines between 06:23:51,346 and 06:24:49,542, but it is not clear to me what causes the process to take this long. [0] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/684/archives/log.html.gz#s1-s33-t1-k2-k11-k2-k1-k1-k6-k2-k1-k5-k1-k3-k1 |
| Comments |
| Comment by Vratko Polak [ 15/May/17 ] |
|
This also affects [2] prefix-based shards. |
| Comment by Tomas Cere [ 16/May/17 ] |
|
Upon investigation it seems the leader was elected in ~5 seconds, however jolokia on member-3 seemed to take around 45 seconds of retries to produce a response in the 5 seconds thats expected in the robot suites. Will add rpc's into cluster-admin for getting the shard role to get around jolokia. |
| Comment by Tomas Cere [ 16/May/17 ] |
|
These introduce the get-shard-role rpcs into cluster admin, which should bypass the slowness of jolokia: https://git.opendaylight.org/gerrit/#/c/57131/ |