[CONTROLLER-1695] tell-based-protocol: ConcurrentModificationException during replay Created: 23/May/17  Updated: 25/Jul/23  Resolved: 30/May/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Robert Varga Assignee: Robert Varga
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: File cme.txt.gz    
Issue Links:
Blocks
blocks CONTROLLER-1664 C: OutOfOrderRequestException: Expect... Resolved
External issue ID: 8538

 Description   

When transitioning from a connecting connection to a local connection, if we have outstanding read replies and immediate aborts, the replay will fail with a ConcurrentModificationException:

java.util.ConcurrentModificationException
at java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719)[:1.8.0_121]
at java.util.LinkedHashMap$LinkedValueIterator.next(LinkedHashMap.java:747)[:1.8.0_121]
at org.opendaylight.controller.cluster.databroker.actors.dds.ProxyHistory$ReconnectCohort.replayRequests(ProxyHistory.java:232)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractClientHistory$1.replayRequests(AbstractClientHistory.java:311)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractDataStoreClientBehavior.lambda$connectionUp$0(AbstractDataStoreClientBehavior.java:147)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.access.client.ClientActorBehavior.backendConnectFinished(ClientActorBehavior.java:293)[197:org.opendaylight.controller.cds-access-client:1.1.0.SNAPSHOT]



 Comments   
Comment by Robert Varga [ 23/May/17 ]

Attachment cme.txt.gz has been added with description: Cleaned-up karaf.log with relevant data

Comment by Robert Varga [ 23/May/17 ]

The reason for this failure is apparent here:

2017-05-23 12:25:16,678 | DEBUG | ult-dispatcher-5 | ProxyHistory | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | LocalHistoryIdentifier{client=ClientIdentifier

{frontend=member-2-frontend-datastore-config, generation=0}

, history=0, cookie=0} created successor transaction proxy LocalReadOnlyProxyTransaction

{identifier=member-2-datastore-config-fe-0-txn-0-0, state=open}
2017-05-23 12:25:16,678 | DEBUG | ult-dispatcher-5 | AbstractProxyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Replaying queued request ReadTransactionRequest{target=member-2-datastore-config-fe-0-txn-0-0, sequence=0, replyTo=Actor[akka://opendaylight-cluster-data/user/$a#-555098784], path=/(urn:opendaylight:yang:aaa:cert?revision=2015-11-26)aaa-cert-service-config} to successor LocalReadOnlyProxyTransaction{identifier=member-2-datastore-config-fe-0-txn-0-0, state=open}

2017-05-23 12:25:16,679 | DEBUG | ult-dispatcher-5 | RemoteProxyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Read request completed with ReadTransactionSuccess

{target=member-2-datastore-config-fe-0-txn-0-0, sequence=0}

2017-05-23 12:25:16,707 | DEBUG | ult-dispatcher-5 | AbstractProxyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Transaction RemoteProxyTransaction

{identifier=member-2-datastore-config-fe-0-txn-0-0, state=successor} allocated sequence 1
2017-05-23 12:25:16,707 | DEBUG | ult-dispatcher-5 | AbstractProxyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Transaction proxy RemoteProxyTransaction{identifier=member-2-datastore-config-fe-0-txn-0-0, state=successor}

sending request ModifyTransactionRequest

{target=member-2-datastore-config-fe-0-txn-0-0, sequence=1, replyTo=Actor[akka://opendaylight-cluster-data/user/$a#-555098784], operations=[], protocol=ABORT}

callback org.opendaylight.controller.cluster.databroker.actors.dds.RemoteProxyTransaction$$Lambda$119/1241459742@11834a0c
2017-05-23 12:25:16,708 | DEBUG | ult-dispatcher-5 | AbstractClientConnection | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.SNAPSHOT | member-2-frontend-datastore-config: timer already scheduled
2017-05-23 12:25:16,708 | DEBUG | ult-dispatcher-5 | ProxyHistory | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Proxy org.opendaylight.controller.cluster.databroker.actors.dds.ProxyHistory$RemoteSingle@434df869 aborting transaction RemoteProxyTransaction

{identifier=member-2-datastore-config-fe-0-txn-0-0, state=successor}

I.e. transaction abort is completing immediately on the replay thread and touches the cohorts map, through which replay is currently iterating – hence causing CME as soon as we try to move to the next proxy.

Comment by Robert Varga [ 24/May/17 ]

https://git.opendaylight.org/gerrit/57759
https://git.opendaylight.org/gerrit/57770

Generated at Wed Feb 07 19:56:13 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.