[CONTROLLER-1873] Tell-based takes 30s to reconnect after fail-over Created: 20/Nov/18  Updated: 27/Nov/18  Resolved: 27/Nov/18

Status: Resolved
Project: controller
Component/s: None
Affects Version/s: Fluorine SR1, Oxygen SR4
Fix Version/s: Neon, Fluorine SR2

Type: Bug Priority: High
Reporter: Tom Pantelis Assignee: Tom Pantelis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

When a shard leader is stopped and the remaining cluster elects a new leader, the next transaction blocks for 30s before successfully completing, as seen
for Car Failover Crud in https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-clustering-tell-all-neon/70/robot-plugin/log.html.gz. This and other tests have always failed due to this.

The problem is that the FE is reactive to leader changes, ie it only tries to reconnect if a transaction operation times out. The ask-based is proactive, ie it flushes shard leader state on notification from the ShardManager when leader state changes. We need to do something similar for tell-based.



 Comments   
Comment by Tom Pantelis [ 27/Nov/18 ]

Master: https://git.opendaylight.org/gerrit/#/c/78029/

Generated at Wed Feb 07 19:56:40 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.