Details
-
Bug
-
Status: Resolved
-
Resolution: Done
-
None
-
None
-
None
-
Operating System: All
Platform: All
-
5919
Description
This task continues https://bugs.opendaylight.org/show_bug.cgi?id=5577 - when FlatBatch RPC is not success then the RETRY is applied.
In ODL cluster there supposed be concurrency/synchronization issue of two cluster nodes calling OFP RPC FlatBatch related to FAILOVER.
Scenario: RPC ERROR -> RETRY -> FRM FAILOVER -> RECONCILE consist of this steps:
a. ODL Cluster Node1 receives Config or Operational DataTreeChange in Inventory and runs RPC FlatBatch
b. FlatBatch returned error but it has written certain flows/groups to the switch but not all of them (after error RETRY is needed)
c. but FRM FAILOVER occurs (it means Config+Operational DataTreeChange will not be received on ODL Cluster Node1 but rather on ODL Cluster Node2)
d. ODL Cluster Node1 needs to stop scheduling RETRY
e. ODL Cluster Node1 might have been quick and was able to RETRY RPC FlatBatch
f. ODL Cluster Node2 needs reconcile (all missing flow/groups are supposed to be written but it uses latest version of config and inventory) run RPC FlatBatch
Assumptions:
- RECONCILIATION is made only when new node appears in Operation Inventory DS. During FAILOVER will not be recreated in Operational Inventory DS.
- (related to above point d.) RETRY is scheduled/triggered by Inventory Operational DataTreeChange notification
- If old leader becomes slave no RETRIES are not scheduled anymore because the DataTreeChange notification will not be received
- (related to above point e. and f.) concurrent calls from two different nodes of the cluster might occur (FRM of old DS leader and FRM of new DS leader)
- if Cluster Node1 is running RPC for concrete device then Node2 needs to get RPC concurrent error and Node 2 needs to RETRY
- if Cluster Node2 is running RPC for concrete device then Node1 needs to get RPC concurrent error because of obsolete data (what about using operational data timestamp? both RETRY and RECONCILIATION needs to use Operational Data) and Node1 should not RETRY
- (related to above point f.) Node2 needs to know that RECONCILE is needed (FAILOVER occured)
- what about comparing timestamps of last operational data? (other problematic alternative a/ ClusterDataTreeChangeListener + EOS, b/ existing API modification to know about this event)
Attachments
Issue Links
- is blocked by
-
OPNFLWPLUG-725 FRsync – integrate with single cluster-wide service API
- Resolved