-
Bug
-
Resolution: Done
-
None
-
None
-
None
-
Operating System: All
Platform: All
-
8606
This symptom is affecting current cluster testing, but after fixing other bugs this might no longer be critical. This has been only seen on Sandbox so far.
The robot symptom "Leader not found" [0] is similar to CONTROLLER-1693 but this time there is no "UnreachableMember" seen in karaf.log [1] (perhaps because CONTROLLER-1703).
CONTROLLER-1675 is similar, but here the leader moving call (remove-shard-replica in this case) does not fail, and karaf.log just states the leadership transfer was not successful:
2017-06-06 07:51:36,160 | WARN | lt-dispatcher-28 | aftActorLeadershipTransferCohort | 193 - org.opendaylight.controller.sal-akka-raft - 1.5.1.SNAPSHOT | member-1-shard-default-config: Failed to transfer leadership in 10.01 s
I suspect the large transaction is the creation of a large list at the start of write-transactions, see CONTROLLER-1703. But I am not sure whether the following proves or disproves that:
2017-06-06 07:52:40,474 | INFO | ternal.Finalizer | lientBackedTransaction$Finalizer | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.SNAPSHOT | Aborted orphan transaction ClientSnapshot
{identifier=member-1-datastore-config-fe-0-txn-7-0}[0] https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/4/log.html.gz#s1-s18-t1-k2-k12-k1-k3-k1
[1] https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/4/odl1_karaf.log.gz
- blocks
-
CONTROLLER-1693 UnreachableMember during remove-shard-replica prevents new leader to get elected
- Resolved
- is blocked by
-
CONTROLLER-1703 Tweak Akka and Java timeouts to a reasonable compromise between stability and failure detection
- In Review