[CONTROLLER-1684] OutOfOrderRequestException when removing shard replica under load Created: 17/May/17  Updated: 25/Jul/23  Resolved: 19/May/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Vratko Polak Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Duplicate
duplicates CONTROLLER-1664 C: OutOfOrderRequestException: Expect... Resolved
External issue ID: 8486

 Description   

This is similar to CONTROLLER-1664 but the scenario is different.

This time we have a transaction producer (module-based shard, tell-based protocol) in the same member as the shard leader, and we call remove-shard-replica.

Robot sees OutOfOrderRequestException [0].
Karaf.log [1] shows more. First there are many warnings:
2017-05-17 02:49:32,274 | WARN | lt-dispatcher-31 | TransmitQueue | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.Carbon | No request matching Envelope{sessionId=1, txSequence=7d0, message=ModifyTransactionSuccess{target=member-2-datastore-config-fe-0-chn-2-txn-1001-0, sequence=0}} found, ignoring response

The error is in the middle of the warning, starting with:
2017-05-17 02:49:32,505 | WARN | lt-dispatcher-23 | ConcurrentDOMDataBroker | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.Carbon | Tx: DOM-CHAIN-1-1216 Error during phase CAN_COMMIT, starting Abort
org.opendaylight.controller.cluster.access.commands.OutOfOrderRequestException: Expecting request 0
at org.opendaylight.controller.cluster.datastore.AbstractFrontendHistory.<clinit>(AbstractFrontendHistory.java:47)
at org.opendaylight.controller.cluster.datastore.FrontendHistoryMetadataBuilder.toLeaderState(FrontendHistoryMetadataBuilder.java:85)
at org.opendaylight.controller.cluster.datastore.FrontendClientMetadataBuilder.toLeaderState(FrontendClientMetadataBuilder.java:172)
at org.opendaylight.controller.cluster.datastore.FrontendMetadata.lambda$toLeaderState$0(FrontendMetadata.java:128)
at com.google.common.collect.Maps$7.transformEntry(Maps.java:1812)
at com.google.common.collect.Maps$10.getValue(Maps.java:1857)
at java.util.HashMap.putMapEntries(HashMap.java:513)
at java.util.HashMap.<init>(HashMap.java:489)
at org.opendaylight.controller.cluster.datastore.FrontendMetadata.toLeaderState(FrontendMetadata.java:128)
at org.opendaylight.controller.cluster.datastore.Shard.onLeaderChanged(Shard.java:830)
at org.opendaylight.controller.cluster.raft.RaftActor.handleBehaviorChange(RaftActor.java:517)

[0] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/692/archives/log.html.gz#s1-s20-t1-k2-k9-k1
[1] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/692/archives/odl2_karaf.log.gz



 Comments   
Comment by Vratko Polak [ 17/May/17 ]

The next run shows UnreachableMember (CONTROLLER-1645 in principle) in karaf.log [2]
2017-05-17 13:12:53,483 | INFO | lt-dispatcher-30 | ShardManager | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.Carbon | Received UnreachableMember: memberName MemberName

{name=member-2}

, address: akka.tcp://opendaylight-cluster-data@10.29.15.116:2550

But for some reason OutOfOrderRequestException is no longer visible in karaf.log (at least o the original leader member), only in the response [3] visible from robot.

[2] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/693/archives/odl3_karaf.log.gz
[3] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/693/archives/log.html.gz#s1-s20-t1-k2-k9-k1

Comment by Vratko Polak [ 17/May/17 ]

This also happened [4] after make-leader-local (still module-based shards with tell-based protocol), but there is UnreachableMember in karaf.log [5]:
2017-05-17 13:20:20,241 | INFO | lt-dispatcher-23 | ShardManager | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.Carbon | Received UnreachableMember: memberName MemberName

{name=member-1}

, address: akka.tcp://opendaylight-cluster-data@10.29.14.225:2550

[4] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/693/archives/log.html.gz#s1-s24-t1-k2-k9-k1
[5] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/693/archives/odl3_karaf.log.gz

Generated at Wed Feb 07 19:56:11 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.