-
Bug
-
Resolution: Done
-
High
-
Fluorine SR1, Oxygen SR4
-
None
-
None
When a shard leader is stopped and the remaining cluster elects a new leader, the next transaction blocks for 30s before successfully completing, as seen
for Car Failover Crud in https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-clustering-tell-all-neon/70/robot-plugin/log.html.gz. This and other tests have always failed due to this.
The problem is that the FE is reactive to leader changes, ie it only tries to reconnect if a transaction operation times out. The ask-based is proactive, ie it flushes shard leader state on notification from the ShardManager when leader state changes. We need to do something similar for tell-based.