Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1858

DOMDataBroker Explicit Leader Movement test fails

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: High High
    • None
    • Oxygen SR3
    • clustering

      DOMDataBroker Explicit Leader Movement test, controller-ddb-expl-lead-movement-longevity, has been failing fairly consistently [1]. The purpose of test from test documentation:

      The goal is to ensure that applications do not observe disruption when a shard leader is moved as the result of explicit application request. This is performed by having a steady-stream producer execute operations against the shard and then initiate shard leader shutdown, then the producer is shut down cleanly.

      Though it shows 3/4 passing, those 3 are for restarting ODL with tell, actual test case has been failing consistently with only one pass on 28th July. The steps mentioned above are done repeatedly in a loop and it fails 1097 times during the entire ~14hour run with error 500 [2].

      Update:
      -----------
      Test steps detailed in [3].
      ODL Logs [4]

      As per [3] searching ODL logs for 'starting leader movement from node' gives us:
      018-08-04T23:40:26,930 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-34 | ClientActorBehavior | 191 - org.opendaylight.controller.cds-access-client - 1.3.3.SNAPSHOT | member-1-frontend-datastore-config: connection ConnectedClientConnection{client=ClientIdentifier

      {frontend=member-1-frontend-datastore-config, generation=0}, cookie=0, backend=ShardBackendInfo{actor=Actorakka://opendaylight-cluster-data/user/shardmanager-config/member-1-shard-default-config#542053028, sessionId=0, version=BORON, maxMessages=1000, cookie=0, shard=default, dataTree=present}} indicated sequencing mismatch on member-1-datastore-config-fe-0-txn-13-0 sequence 0 (26), reconnecting it
      org.opendaylight.controller.cluster.access.commands.OutOfSequenceEnvelopeException: Expecting envelope 0
      at org.opendaylight.controller.cluster.datastore.LeaderFrontendState.checkRequestSequence(LeaderFrontendState.java:98) [220:org.opendaylight.controller.sal-distributed-datastore:1.7.3.SNAPSHOT]
      at org.opendaylight.controller.cluster.datastore.LeaderFrontendState.handleTransactionRequest(LeaderFrontendState.java:191) [220:org.opendaylight.controller.sal-distributed-datastore:1.7.3.SNAPSHOT]
      at org.opendaylight.controller.cluster.datastore.Shard.handleRequest(Shard.java:573) [220:org.opendaylight.controller.sal-distributed-datastore:1.7.3.SNAPSHOT]
      at org.opendaylight.controller.cluster.datastore.Shard.handleRequestEnvelope(Shard.java:398) [220:org.opendaylight.controller.sal-distributed-datastore:1.7.3.SNAPSHOT]

      Also:

      2018-08-05T01:17:16,637 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-36 | ClientActorBehavior | 191 - org.opendaylight.controller.cds-access-client - 1.3.3.SNAPSHOT | member-1-frontend-datastore-config: connection ConnectedClientConnection{client=ClientIdentifier{frontend=member-1-frontend-datastore-config, generation=0}

      , cookie=0, backend=ShardBackendInfo{actor=Actor[akka.tcp://opendaylight-cluster-data@10.30.170.171:2550/user/shardmanager-config/member-2-shard-default-config#-1686300808], sessionId=71, version=BORON, maxMessages=1000, cookie=0, shard=default, dataTree=absent}} indicated no leadership, reconnecting it
      org.opendaylight.controller.cluster.access.commands.NotLeaderException: Actor Actorakka://opendaylight-cluster-data/user/shardmanager-config/member-2-shard-default-config#-1686300808 is not the current leader
      at org.opendaylight.controller.cluster.datastore.Shard.handleRequest(Shard.java:566) ~[220:org.opendaylight.controller.sal-distributed-datastore:1.7.3.SNAPSHOT]
      at org.opendaylight.controller.cluster.datastore.Shard.handleRequestEnvelope(Shard.java:398) ~[220:org.opendaylight.controller.sal-distributed-datastore:1.7.3.SNAPSHOT]
      at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:324) ~[220:org.opendaylight.controller.sal-distributed-datastore:1.7.3.SNAPSHOT]

      [1] https://jenkins.opendaylight.org/releng/job/controller-csit-3node-ddb-expl-lead-movement-longevity-only-oxygen/
      [2] https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-ddb-expl-lead-movement-longevity-only-oxygen/51/robot-plugin/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k1
      [3] https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-ddb-expl-lead-movement-longevity-only-oxygen/51/robot-plugin/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k1
      [4] https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-ddb-expl-lead-movement-longevity-only-oxygen/51/

            thapar Vishal Thapar
            thapar Vishal Thapar
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: