-
Bug
-
Resolution: Cannot Reproduce
-
None
-
Beryllium
-
None
-
Operating System: All
Platform: All
-
4823
-
High
Build used :
===================
Karaf distro from latest ODL Beryllium master code
Test Type :
===================
Adding OF Flows to inventory shard of config-ds datastore only (no switches connected)
Objective of test :
===================
To stress the datastore by adding flows spread across switches
Test Steps :
============
1. Bring up healthy 3 node cluster
2. Write OF flows into inventory config datastore
a. Used newWriteOnlyTransaction
b. Flows spread across switches
c. Flows of single switch pushed sequentially (in onSuccess of previous txn, next flow is submitted), but across switches txns are pushed in parallel
3. Check if the flows are completely pushed and record various metrics like - rate, state and latency
Controllers (to cross-check logs):
===================================
c1 - Controller 1 with IP 10.183.181.41 - config-inventory-shard leader
c2 - Controller 2 with IP 10.183.181.42 - config-inventory-shard follower (flow transactions are initiated from c2)
c3 - Controller 3 with IP 10.183.181.43 - config-inventory-shard follower
Enclosed Logs:
==============
c1.karaf.log for controller c1
c2.karaf.log for controller c2
c3.karaf.log for controller c3
Observations and issue-summary:
================================
1. Leader (c1) marks of one of followers (c3) UNREACHABLE - line 1230 of c1.karaf.log
2. Leader (c1) marks follower (c3) REACHABLE within few milliseconds - line 1232 of c1.karaf.log
3. In line 1566 in log c2.karaf.log follower c2 marks leader c1 as UNREACHABLE but never recovers from that state. And all transactions initiated from c2 fails with org.opendaylight.controller.cluster.datastore.exceptions.NoShardLeaderException: Shard member-2-shard-inventory-config currently has no leader. Try again later.
4. After this state, transactions from follower does not go through - which is understandable as a transient state due to leader becoming UNREACHABLE
5. But, even after substantial 2-3 minutes after above incidence, transactions from follower c2 or c3 keeps failing.