-
Bug
-
Resolution: Unresolved
-
High
-
None
-
Oxygen SR3
DOMDataBroker Explicit Leader Movement test, controller-ddb-expl-lead-movement-longevity, has been failing fairly consistently [1]. The purpose of test from test documentation:
The goal is to ensure that applications do not observe disruption when a shard leader is moved as the result of explicit application request. This is performed by having a steady-stream producer execute operations against the shard and then initiate shard leader shutdown, then the producer is shut down cleanly.
Though it shows 3/4 passing, those 3 are for restarting ODL with tell, actual test case has been failing consistently with only one pass on 28th July. The steps mentioned above are done repeatedly in a loop and it fails 1097 times during the entire ~14hour run with error 500 [2].
Update:
-----------
Test steps detailed in [3].
ODL Logs [4]
As per [3] searching ODL logs for 'starting leader movement from node' gives us:
018-08-04T23:40:26,930 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-34 | ClientActorBehavior | 191 - org.opendaylight.controller.cds-access-client - 1.3.3.SNAPSHOT | member-1-frontend-datastore-config: connection ConnectedClientConnection{client=ClientIdentifier
org.opendaylight.controller.cluster.access.commands.OutOfSequenceEnvelopeException: Expecting envelope 0
at org.opendaylight.controller.cluster.datastore.LeaderFrontendState.checkRequestSequence(LeaderFrontendState.java:98) [220:org.opendaylight.controller.sal-distributed-datastore:1.7.3.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.LeaderFrontendState.handleTransactionRequest(LeaderFrontendState.java:191) [220:org.opendaylight.controller.sal-distributed-datastore:1.7.3.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.Shard.handleRequest(Shard.java:573) [220:org.opendaylight.controller.sal-distributed-datastore:1.7.3.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.Shard.handleRequestEnvelope(Shard.java:398) [220:org.opendaylight.controller.sal-distributed-datastore:1.7.3.SNAPSHOT]
Also:
2018-08-05T01:17:16,637 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-36 | ClientActorBehavior | 191 - org.opendaylight.controller.cds-access-client - 1.3.3.SNAPSHOT | member-1-frontend-datastore-config: connection ConnectedClientConnection{client=ClientIdentifier{frontend=member-1-frontend-datastore-config, generation=0}
, cookie=0, backend=ShardBackendInfo{actor=Actor[akka.tcp://opendaylight-cluster-data@10.30.170.171:2550/user/shardmanager-config/member-2-shard-default-config#-1686300808], sessionId=71, version=BORON, maxMessages=1000, cookie=0, shard=default, dataTree=absent}} indicated no leadership, reconnecting it
org.opendaylight.controller.cluster.access.commands.NotLeaderException: Actor Actorakka://opendaylight-cluster-data/user/shardmanager-config/member-2-shard-default-config#-1686300808 is not the current leader
at org.opendaylight.controller.cluster.datastore.Shard.handleRequest(Shard.java:566) ~[220:org.opendaylight.controller.sal-distributed-datastore:1.7.3.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.Shard.handleRequestEnvelope(Shard.java:398) ~[220:org.opendaylight.controller.sal-distributed-datastore:1.7.3.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:324) ~[220:org.opendaylight.controller.sal-distributed-datastore:1.7.3.SNAPSHOT]
[1] https://jenkins.opendaylight.org/releng/job/controller-csit-3node-ddb-expl-lead-movement-longevity-only-oxygen/
[2] https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-ddb-expl-lead-movement-longevity-only-oxygen/51/robot-plugin/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k1
[3] https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-ddb-expl-lead-movement-longevity-only-oxygen/51/robot-plugin/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k1-k1-k1
[4] https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-ddb-expl-lead-movement-longevity-only-oxygen/51/