[OPNFLWPLUG-691] FR-sync - Improve FRM calling RPC FlatBatch during Failover & RETRY Created: 19/May/16  Updated: 27/Sep/21  Resolved: 28/Jul/16

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Jozef Slezák Assignee: Andrej Leitner
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Blocks
is blocked by OPNFLWPLUG-725 FRsync – integrate with single cluste... Resolved
External issue ID: 5919

 Description   

This task continues https://bugs.opendaylight.org/show_bug.cgi?id=5577 - when FlatBatch RPC is not success then the RETRY is applied.

In ODL cluster there supposed be concurrency/synchronization issue of two cluster nodes calling OFP RPC FlatBatch related to FAILOVER.

Scenario: RPC ERROR -> RETRY -> FRM FAILOVER -> RECONCILE consist of this steps:
a. ODL Cluster Node1 receives Config or Operational DataTreeChange in Inventory and runs RPC FlatBatch
b. FlatBatch returned error but it has written certain flows/groups to the switch but not all of them (after error RETRY is needed)
c. but FRM FAILOVER occurs (it means Config+Operational DataTreeChange will not be received on ODL Cluster Node1 but rather on ODL Cluster Node2)
d. ODL Cluster Node1 needs to stop scheduling RETRY
e. ODL Cluster Node1 might have been quick and was able to RETRY RPC FlatBatch
f. ODL Cluster Node2 needs reconcile (all missing flow/groups are supposed to be written but it uses latest version of config and inventory) run RPC FlatBatch

Assumptions:

  • RECONCILIATION is made only when new node appears in Operation Inventory DS. During FAILOVER will not be recreated in Operational Inventory DS.
  • (related to above point d.) RETRY is scheduled/triggered by Inventory Operational DataTreeChange notification
  • If old leader becomes slave no RETRIES are not scheduled anymore because the DataTreeChange notification will not be received
  • (related to above point e. and f.) concurrent calls from two different nodes of the cluster might occur (FRM of old DS leader and FRM of new DS leader)
  • if Cluster Node1 is running RPC for concrete device then Node2 needs to get RPC concurrent error and Node 2 needs to RETRY
  • if Cluster Node2 is running RPC for concrete device then Node1 needs to get RPC concurrent error because of obsolete data (what about using operational data timestamp? both RETRY and RECONCILIATION needs to use Operational Data) and Node1 should not RETRY
  • (related to above point f.) Node2 needs to know that RECONCILE is needed (FAILOVER occured)
  • what about comparing timestamps of last operational data? (other problematic alternative a/ ClusterDataTreeChangeListener + EOS, b/ existing API modification to know about this event)


 Comments   
Comment by Andrej Leitner [ 12/Jul/16 ]

currently trying to resolve this issue within OPNFLWPLUG-725

Comment by Andrej Leitner [ 28/Jul/16 ]

merged in OPNFLWPLUG-725
https://git.opendaylight.org/gerrit/#/q/project:openflowplugin+branch:master+topic:bug6170

Generated at Wed Feb 07 20:33:08 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.