This is somewhat similar to CONTROLLER-1697 in that this happens in "Partition and Heal" suite for singleton service with tell-based protocol. But this time the long read is done during isolation phase. Member-1 is being isolated, member-2 is queried and it takes Restconf 46 seconds [0] to respond.
The situation is complicated by the fact that only all job is running on releng, and I have not been able to reproduce this behavior on Sandbox [1], possibly because this failure happened only twice, and it did not happen today's test.
The quarantine is seen in karaf.log [2] of the isolated member, just after the network is rejoined:
2017-09-16 13:00:11,447 | ERROR | ult-dispatcher-6 | Remoting | 155 - com.typesafe.akka.slf4j - 2.4.18 | Association to [akka.tcp://opendaylight-cluster-data@10.29.13.227:2550] with UID [-81825957] irrecoverably failed. Quarantining address.
java.util.concurrent.TimeoutException: Delivery of system messages timed out and they were dropped.
at akka.remote.ReliableDeliverySupervisor$$anonfun$gated$1.applyOrElse(Endpoint.scala:351)[164:com.typesafe.akka.remote:2.4.18]
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)[154:com.typesafe.akka.actor:2.4.18]
at akka.remote.ReliableDeliverySupervisor.aroundReceive(Endpoint.scala:203)[164:com.typesafe.akka.remote:2.4.18]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)[154:com.typesafe.akka.actor:2.4.18]
at akka.actor.ActorCell.invoke(ActorCell.scala:495)[154:com.typesafe.akka.actor:2.4.18]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)[154:com.typesafe.akka.actor:2.4.18]
at akka.dispatch.Mailbox.run(Mailbox.scala:224)[154:com.typesafe.akka.actor:2.4.18]
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[154:com.typesafe.akka.actor:2.4.18]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)[150:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)[150:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)[150:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)[150:org.scala-lang.scala-library:2.11.11.v20170413-090219-8a413ba7cc]
Currently, I think the long read is just a consequence of features being installed, and the quarantine is expected due to the delay. After fixing CONTROLLER-1771 (and implementing the better wait in Integration/Test), I believe this symptom will not appear anymore.
[0] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-all-carbon/443/log.html.gz#s1-s42-t3-k2-k6-k1-k1-k3-k1-k3-k1-k3-k1
[1] https://jenkins.opendaylight.org/sandbox/job/controller-csit-3node-clustering-all-carbon/
[2] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-all-carbon/443/odl1_karaf.log.gz