Details
-
Bug
-
Status: Resolved
-
High
-
Resolution: Done
-
Fluorine SR1, Oxygen SR4
-
None
-
None
Description
When a shard leader is stopped and the remaining cluster elects a new leader, the next transaction blocks for 30s before successfully completing, as seen
for Car Failover Crud in https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-clustering-tell-all-neon/70/robot-plugin/log.html.gz. This and other tests have always failed due to this.
The problem is that the FE is reactive to leader changes, ie it only tries to reconnect if a transaction operation times out. The ask-based is proactive, ie it flushes shard leader state on notification from the ShardManager when leader state changes. We need to do something similar for tell-based.
Attachments
| # | Subject | Branch | Project | Status | CR | V |
|---|---|---|---|---|---|---|
| 78029,5 | tell-based - reconnect on leader change | master | controller | Status: MERGED | +2 | +1 |
| 78211,2 | tell-based - reconnect on leader change | stable/fluorine | controller | Status: MERGED | +2 | +1 |