Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1873

Tell-based takes 30s to reconnect after fail-over

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • High
    • Resolution: Done
    • Fluorine SR1, Oxygen SR4
    • Neon, Fluorine SR2
    • None
    • None

    Description

      When a shard leader is stopped and the remaining cluster elects a new leader, the next transaction blocks for 30s before successfully completing, as seen
      for Car Failover Crud in https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-clustering-tell-all-neon/70/robot-plugin/log.html.gz. This and other tests have always failed due to this.

      The problem is that the FE is reactive to leader changes, ie it only tries to reconnect if a transaction operation times out. The ask-based is proactive, ie it flushes shard leader state on notification from the ShardManager when leader state changes. We need to do something similar for tell-based.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            tpantelis Tom Pantelis
            tpantelis Tom Pantelis
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: