[CONTROLLER-1665] C: write-transactions does not return if local leader moved Created: 09/May/17  Updated: 25/Jul/23  Resolved: 07/Jun/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Peter Gubka Assignee: Robert Varga
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Blocks
blocks CONTROLLER-1700 Timeout waiting for task from writer ... Resolved
is blocked by CONTROLLER-1668 Tell based protocol doesnt time out t... Resolved
is blocked by CONTROLLER-1689 stopping resolution of shard 0 on sta... Resolved
External issue ID: 8403

 Description   

Write-transactions rcp invoked on the leader of default module shard.
url: http://10.29.15.39:8181/restconf/operations/odl-mdsal-lowlevel-control:write-transactions, data: <input xmlns="tag:opendaylight.org,2017:controller:yang:lowlevel:control">
<id>prefix-1</id>
<seconds>30</seconds>
<transactions-per-second>1000</transactions-per-second>
<chained-transactions>True</chained-transactions>
</input>

Within this time the leader was moved to different node. But after 30s the write-transactions rpc did not return and after 90s is was aborted from robot side
https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon-2nd/12/archives/log.html.gz#s1-s1-t1-k2-k8

The bug is similar to https://bugs.opendaylight.org/show_bug.cgi?id=8372, but that time is was without any leader movement, just write-transactions invoked.

Logs of trans producer node:
https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon-2nd/12/archives/odl1_karaf.log.gz
2017-05-09 04:55:51,788 | INFO | h for user karaf | command | 266 - org.apache.karaf.log.command - 3.0.8 | ROBOT MESSAGE: Starting test Local_To_Remote_Movement



 Comments   
Comment by Peter Gubka [ 09/May/17 ]

Same behavior happened when the write-transactions was run on the default shard leader node and this was isolated(iptables).
Instead of aborting the rpc after hard timeout, is was aborted after 150s from robot side
https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon-2nd/13/archives/log.html.gz#s1-s1-t1-k2-k18-k3

Logs:
https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon-2nd/13/archives/odl1_karaf.log.gz
2017-05-09 05:16:59,386 | INFO | h for user karaf | command | 266 - org.apache.karaf.log.command - 3.0.8 | ROBOT MESSAGE: Starting test Producer_On_Shard_Leader_Node_ChainedTx

Comment by Robert Varga [ 15/May/17 ]

For tell-based protocol, after BUG-8422 the deadline for requests is 120 seconds, not 30 seconds (which is reconnect timer).

Comment by Vratko Polak [ 18/May/17 ]

> the deadline for requests is 120 seconds

Re-opening.
write-transactions have not returned [4] within 180 seconds (30 seconds of work, so 150 seconds overhead, which is more than 120 seconds).
The scenario was removing shard replica on the leader; module-based shard with tell-based protocol.

[4] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/694/archives/log.html.gz#s1-s20-t1-k2-k8

Comment by Robert Varga [ 19/May/17 ]

One troublesome thing is:

java.util.ConcurrentModificationException
at java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:643)[:1.8.0_121]
at com.google.common.collect.Iterators$5.next(Iterators.java:558)[57:com.google.guava:18.0.0]
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction.replayMessages(AbstractProxyTransaction.java:538)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.Carbon]
at org.opendaylight.controller.cluster.databroker.actors.dds.ProxyHistory$ReconnectCohort.replayRequests(ProxyHistory.java:232)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.Carbon]
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractClientHistory$1.replayRequests(AbstractClientHistory.java:310)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.Carbon]
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractDataStoreClientBehavior.lambda$connectionUp$0(AbstractDataStoreClientBehavior.java:147)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.Carbon]
at org.opendaylight.controller.cluster.access.client.ClientActorBehavior.backendConnectFinished(ClientActorBehavior.java:280)[197:org.opendaylight.controller.cds-access-client:1.1.0.Carbon]
at org.opendaylight.controller.cluster.access.client.ClientActorBehavior.lambda$null$3(ClientActorBehavior.java:329)[197:org.opendaylight.controller.cds-access-client:1.1.0.Carbon]
at org.opendaylight.controller.cluster.access.client.ClientActorBehavior.onReceiveCommand(ClientActorBehavior.java:118)[197:org.opendaylight.controller.cds-access-client:1.1.0.Carbon]
at org.opendaylight.controller.cluster.access.client.ClientActorBehavior.onReceiveCommand(ClientActorBehavior.java:42)[197:org.opendaylight.controller.cds-access-client:1.1.0.Carbon]
at org.opendaylight.controller.cluster.access.client.AbstractClientActor.onReceiveCommand(AbstractClientActor.java:59)[197:org.opendaylight.controller.cds-access-client:1.1.0.Carbon]
at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:170)[180:com.typesafe.akka.persistence:2.4.17]
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.Actor$class.aroundReceive(Actor.scala:497)[173:com.typesafe.akka.actor:2.4.17]
at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:168)[180:com.typesafe.akka.persistence:2.4.17]
at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:664)[180:com.typesafe.akka.persistence:2.4.17]
at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:183)[180:com.typesafe.akka.persistence:2.4.17]
at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:168)[180:com.typesafe.akka.persistence:2.4.17]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)[173:com.typesafe.akka.actor:2.4.17]
at akka.actor.ActorCell.invoke(ActorCell.scala:495)[173:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)[173:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.run(Mailbox.scala:224)[173:com.typesafe.akka.actor:2.4.17]
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[173:com.typesafe.akka.actor:2.4.17]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[169:org.scala-lang.scala-library:2.11.8.v20160304-115712-1706a37eb8]

I'll try to see what the heck is going on.

Comment by Robert Varga [ 19/May/17 ]

https://git.opendaylight.org/gerrit/57438

Comment by Vratko Polak [ 22/May/17 ]

> https://git.opendaylight.org/gerrit/57438

Even after that merged, this Bug still occurs [0] (module-based shard, tell-based protocol, graceful leader move).
There is nothing suspicious in karaf.log [1] after 00:38:39,727.

[0] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/715/archives/log.html.gz#s1-s24-t1-k2-k10
[1] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/715/archives/odl1_karaf.log.gz

Comment by Peter Gubka [ 22/May/17 ]

There is a sandbox run for patches 57598+99
https://jenkins.opendaylight.org/sandbox/job/controller-csit-3node-clustering-only-carbon-3rd/3/

Comment by Peter Gubka [ 23/May/17 ]

Another sandbox run (odl includes patches 57598+99) with debug logging for org.opendaylight.controller.cluster.databroker.actors.dds and org.opendaylight.controller.cluster.access.client
https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon-3rd/4/archives/

Comment by Vratko Polak [ 23/May/17 ]

Still occasionally happens [9] when a (module-based, tell protocol) shard replica is removed from the leader member.

Possibly just because the fix [10] has not been merged yet.

[9] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/719/archives/log.html.gz#s1-s22-t1-k2-k9
[10] https://git.opendaylight.org/gerrit/57423

Comment by Vratko Polak [ 23/May/17 ]

> shard replica is removed from the leader member.

In controller-csit-3node-clustering-only-carbon/720 this happened reliably, both for module-based and prefix based shard, but only when the writer/producer was located on the leader member.

Comment by Peter Gubka [ 23/May/17 ]

New run with debug logs here:
https://jenkins.opendaylight.org/sandbox/job/controller-csit-3node-clustering-only-carbon-2nd/8/

Not returned rest response from write-transactions rpc https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon-2nd/8/archives/log.html.gz#s1-s2-t1-k2-k11

Odl zip from: https://jenkins.opendaylight.org/releng/view/integration/job/integration-multipatch-test-carbon/37/

Comment by Peter Gubka [ 25/May/17 ]

Debug logs including akka
https://jenkins.opendaylight.org/sandbox/job/controller-csit-3node-clustering-only-carbon-2nd/11/

Odl built from controller=63/57763/2:99/57699/3 at
https://jenkins.opendaylight.org/releng/view/integration/job/integration-multipatch-test-carbon/43

Not returned rpc
https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon-2nd/11/archives/log.html.gz#s1-s2-t1-k2-k11

Comment by Peter Gubka [ 25/May/17 ]

New debug logs (with akka)
https://jenkins.opendaylight.org/sandbox/job/controller-csit-3node-clustering-only-carbon-2nd/14/

Odl built from controller=70/57770/4:99/57699/3 at
https://jenkins.opendaylight.org/releng/view/integration/job/integration-multipatch-test-carbon/45/

Not-returned rpc
https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon-2nd/14/archives/log.html.gz#s1-s2-t1-k2-k11

Comment by Robert Varga [ 25/May/17 ]

It seems we are hitting TransmitQueue throttling when sending a purge request after the transaction completes. Since there is a ton of requests in the queue at that time, we naturally get throttled.

https://git.opendaylight.org/gerrit/57822 eliminates that delay and adds debug when we get blocked for more than 100msec.

Comment by Vratko Polak [ 26/May/17 ]

This still happens [0], also on Carbon post-release codebase [1].
The difference is that this now does not happen after remove-shard replica (CONTROLLER-1692 is seen there instead), only after make-leader-local or become-prefix-leader.

[0] https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/15/archives/log.html.gz#s1-s24-t1-k2-k10
[1] https://git.opendaylight.org/gerrit/#/c/57822/1

Comment by Robert Varga [ 29/May/17 ]

In general this is a manifestation of the client actor dying and being restarted – the restarted instance has no state, so there is noone to complete the futures the old instance has no knowledge of it.

The latest failure is an ISE splat during replay of a transaction which is waiting for purge to complete, as the successor txchain is not transitioned to account for the transaction being closed:

2017-05-29 03:22:27,394 | DEBUG | lt-dispatcher-22 | ProxyHistory | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | LocalHistoryIdentifier{client=ClientIdentifier

{frontend=member-1-frontend-datastore-config, generation=0}, history=2, cookie=0} creating successor transaction proxy for RemoteProxyTransaction{identifier=member-1-datastore-config-fe-0-chn-2-txn-2-0, state=successor}
2017-05-29 03:22:27,394 | DEBUG | lt-dispatcher-22 | ProxyHistory | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Proxy org.opendaylight.controller.cluster.databroker.actors.dds.ProxyHistory$Local@3d6366fb open transaction LocalReadWriteProxyTransaction{identifier=member-1-datastore-config-fe-0-chn-2-txn-2-0, state=open}
2017-05-29 03:22:27,394 | DEBUG | lt-dispatcher-22 | ProxyHistory | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Allocated proxy member-1-datastore-config-fe-0-chn-2-txn-2-0 for transaction member-1-datastore-config-fe-0-chn-2-txn-2-0
2017-05-29 03:22:27,394 | DEBUG | lt-dispatcher-22 | ProxyHistory | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | LocalHistoryIdentifier{client=ClientIdentifier{frontend=member-1-frontend-datastore-config, generation=0}

, history=2, cookie=0} created successor transaction proxy LocalReadWriteProxyTransaction

{identifier=member-1-datastore-config-fe-0-chn-2-txn-2-0, state=open}
2017-05-29 03:22:27,394 | DEBUG | lt-dispatcher-22 | AbstractProxyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Replaying queued request TransactionPurgeRequest{target=member-1-datastore-config-fe-0-chn-2-txn-2-0, sequence=2, replyTo=Actor[akka://opendaylight-cluster-data/user/$a#1472136728]} to successor LocalReadWriteProxyTransaction{identifier=member-1-datastore-config-fe-0-chn-2-txn-2-0, state=open}

2017-05-29 03:22:27,394 | DEBUG | lt-dispatcher-22 | LocalReadWriteProxyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Applying replayed request TransactionPurgeRequest

{target=member-1-datastore-config-fe-0-chn-2-txn-2-0, sequence=2, replyTo=Actor[akka://opendaylight-cluster-data/user/$a#1472136728]}

2017-05-29 03:22:27,394 | DEBUG | lt-dispatcher-22 | AbstractProxyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Transaction LocalReadWriteProxyTransaction

{identifier=member-1-datastore-config-fe-0-chn-2-txn-2-0, state=open} allocated sequence 0
2017-05-29 03:22:27,394 | DEBUG | lt-dispatcher-22 | AbstractProxyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Transaction proxy LocalReadWriteProxyTransaction{identifier=member-1-datastore-config-fe-0-chn-2-txn-2-0, state=open}

enqueing request TransactionPurgeRequest

{target=member-1-datastore-config-fe-0-chn-2-txn-2-0, sequence=0, replyTo=Actor[akka://opendaylight-cluster-data/user/$a#1472136728]}

callback org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction$$Lambda$135/333503310@5ed81725
2017-05-29 03:22:27,395 | DEBUG | lt-dispatcher-22 | AbstractClientConnection | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.SNAPSHOT | member-1-frontend-datastore-config: scheduling timeout in 30 seconds
2017-05-29 03:22:27,395 | DEBUG | lt-dispatcher-22 | TransmitQueue | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.SNAPSHOT | Queue ConnectionEntry{request=TransactionPurgeRequest

{target=member-1-datastore-config-fe-0-chn-2-txn-2-0, sequence=0, replyTo=Actor[akka://opendaylight-cluster-data/user/$a#1472136728]}

, enqueuedTicks=1989409436188} transmitting entry {}
2017-05-29 03:22:27,398 | DEBUG | lt-dispatcher-22 | ProxyHistory | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | LocalHistoryIdentifier{client=ClientIdentifier

{frontend=member-1-frontend-datastore-config, generation=0}, history=2, cookie=0} creating successor transaction proxy for RemoteProxyTransaction{identifier=member-1-datastore-config-fe-0-chn-2-txn-3-0, state=successor}
2017-05-29 03:22:27,399 | DEBUG | lt-dispatcher-22 | AbstractClientHistory | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Client history org.opendaylight.controller.cluster.databroker.actors.dds.SingleClientHistory@125db7b7 finishing reconnect to ConnectedClientConnection{client=ClientIdentifier{frontend=member-1-frontend-datastore-config, generation=0}

, cookie=0, backend=ShardBackendInfo{actor=Actorakka://opendaylight-cluster-data/user/shardmanager-config/member-1-shard-default-config#-773763026, sessionId=1, version=BORON, maxMessages=1000, cookie=0, shard=default, dataTree present=true}}
2017-05-29 03:22:27,399 | DEBUG | lt-dispatcher-22 | ProxyHistory | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Finished reconnecting proxy history org.opendaylight.controller.cluster.databroker.actors.dds.ProxyHistory$ReconnectCohort@41c76af8
2017-05-29 03:22:27,399 | DEBUG | lt-dispatcher-22 | AbstractClientHistory | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Client history org.opendaylight.controller.cluster.databroker.actors.dds.ClientLocalHistory@61fea21f finishing reconnect to ConnectedClientConnection{client=ClientIdentifier

{frontend=member-1-frontend-datastore-config, generation=0}

, cookie=0, backend=ShardBackendInfo{actor=Actorakka://opendaylight-cluster-data/user/shardmanager-config/member-1-shard-default-config#-773763026, sessionId=1, version=BORON, maxMessages=1000, cookie=0, shard=default, dataTree present=true}}
2017-05-29 03:22:27,399 | DEBUG | lt-dispatcher-22 | AbstractProxyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Finishing reconnect of proxy RemoteProxyTransaction

{identifier=member-1-datastore-config-fe-0-chn-2-txn-2-0, state=successor}

2017-05-29 03:22:27,399 | DEBUG | lt-dispatcher-22 | AbstractProxyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Finishing reconnect of proxy RemoteProxyTransaction

{identifier=member-1-datastore-config-fe-0-chn-2-txn-3-0, state=successor}

[...]

2017-05-29 03:22:27,499 | ERROR | lt-dispatcher-30 | OneForOneStrategy | 174 - com.typesafe.akka.slf4j - 2.4.17 | Proxy org.opendaylight.controller.cluster.databroker.actors.dds.ProxyHistory$Local@3d6366fb has LocalReadWriteProxyTransaction

{identifier=member-1-datastore-config-fe-0-chn-2-txn-2-0, state=open} currently open
java.lang.IllegalStateException: Proxy org.opendaylight.controller.cluster.databroker.actors.dds.ProxyHistory$Local@3d6366fb has LocalReadWriteProxyTransaction{identifier=member-1-datastore-config-fe-0-chn-2-txn-2-0, state=open}

currently open
at com.google.common.base.Preconditions.checkState(Preconditions.java:197)[57:com.google.guava:18.0.0]
at org.opendaylight.controller.cluster.databroker.actors.dds.ProxyHistory$Local.doCreateTransactionProxy(ProxyHistory.java:90)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.databroker.actors.dds.ProxyHistory.createTransactionProxy(ProxyHistory.java:367)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.databroker.actors.dds.ProxyHistory$ReconnectCohort.replayRequests(ProxyHistory.java:235)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractClientHistory$1.replayRequests(AbstractClientHistory.java:311)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractDataStoreClientBehavior.lambda$connectionUp$0(AbstractDataStoreClientBehavior.java:147)[199:org.opendaylight.controller.sal-distributed-datastore:1.5.0.SNAPSHOT]
at org.opendaylight.controller.cluster.access.client.ClientActorBehavior.backendConnectFinished(ClientActorBehavior.java:293)[197:org.opendaylight.controller.cds-access-client:1.1.0.SNAPSHOT]

after which the actor restarts and receives:

2017-05-29 03:22:27,512 | DEBUG | lt-dispatcher-22 | AbstractClientActor | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.SNAPSHOT | member-1-frontend-datastore-config: switched from org.opendaylight.controller.cluster.access.client.SavingClientActorBehavior@140e3fe1 to org.opendaylight.controller.cluster.databroker.actors.dds.DistributedDataStoreClientBehavior@329ed459
2017-05-29 03:22:27,793 | INFO | ult-dispatcher-4 | ClientActorBehavior | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.SNAPSHOT | member-1-frontend-datastore-config: Ignoring unknown response Envelope{sessionId=1, txSequence=0, message=TransactionPurgeResponse

{target=member-1-datastore-config-fe-0-chn-2-txn-2-0, sequence=0}

}

Comment by Robert Varga [ 31/May/17 ]

https://git.opendaylight.org/gerrit/57994

Comment by Vratko Polak [ 05/Jun/17 ]

> https://git.opendaylight.org/gerrit/57994

After that got merged, this Bug stopped appearing in most scenarios, except one.

The scenario uses single writer located at a follower and then the leader is moved to that member. This scenario fails both for module-based and prefix-based shard (tell-based protocol), and in both cases UnreachableMember happens shortly after movement is started.
Possibly, akka timer tweaking could get rid of this Bug.

Here is a karaf.log [24] segment from recent module-based RelEng test:

2017-06-04 14:13:16,733 | INFO | qtp897580718-78 | ClusterAdminRpcService | 201 - org.opendaylight.controller.sal-cluster-admin-impl - 1.5.1.SNAPSHOT | Moving leader to local node member-2 for shard default, datastoreType Config
2017-06-04 14:13:19,527 | WARN | ult-dispatcher-3 | ClusterCoreDaemon | 174 - com.typesafe.akka.slf4j - 2.4.18 | Cluster Node [akka.tcp://opendaylight-cluster-data@10.29.14.91:2550] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://opendaylight-cluster-data@10.29.15.16:2550, status = Up)]. Node roles [member-2]
2017-06-04 14:13:19,530 | INFO | lt-dispatcher-24 | ShardManager | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.SNAPSHOT | Received UnreachableMember: memberName MemberName

{name=member-1}, address: akka.tcp://opendaylight-cluster-data@10.29.15.16:2550
2017-06-04 14:13:19,530 | INFO | lt-dispatcher-31 | ShardManager | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.SNAPSHOT | Received UnreachableMember: memberName MemberName{name=member-1}

, address: akka.tcp://opendaylight-cluster-data@10.29.15.16:2550
2017-06-04 14:13:19,531 | INFO | lt-dispatcher-24 | EntityOwnershipShard | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.1.SNAPSHOT | member-2-shard-entity-ownership-operational: onPeerDown: PeerDown [memberName=member-1, peerId=member-1-shard-entity-ownership-operational]
2017-06-04 14:13:20,526 | INFO | ult-dispatcher-5 | kka://opendaylight-cluster-data) | 174 - com.typesafe.akka.slf4j - 2.4.18 | Cluster Node [akka.tcp://opendaylight-cluster-data@10.29.14.91:2550] - Marking node(s) as REACHABLE [Member(address = akka.tcp://opendaylight-cluster-data@10.29.15.16:2550, status = Up)]. Node roles [member-2]
2017-06-04 14:13:20,527 | INFO | ult-dispatcher-4 | ShardManager | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.SNAPSHOT | Received ReachableMember: memberName MemberName

{name=member-1}

, address: akka.tcp://opendaylight-cluster-data@10.29.15.16:2550

[24] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/733/odl2_karaf.log.gz

Comment by Vratko Polak [ 05/Jun/17 ]

Link to a failing Sandbox longevity job [25] with code [26] not merged yet and verbose logs.

[25] https://jenkins.opendaylight.org/sandbox/job/controller-csit-3node-ddb-expl-lead-movement-longevity-only-carbon/2/
[26] https://git.opendaylight.org/gerrit/58194

Comment by Robert Varga [ 05/Jun/17 ]

The last patch had a thinko, https://git.opendaylight.org/gerrit/58274 fixes that.

Generated at Wed Feb 07 21:54:08 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.