[CONTROLLER-1399] Clustering: DCN varies in normal and restart scenarios Created: 05/Aug/15 Updated: 14/Sep/15 Resolved: 14/Sep/15 |
|
| Status: | Verified |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | Lithium |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Muthukumaran Kothandaraman | Assignee: | Tom Pantelis |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| External issue ID: | 4094 | ||||||||
| Description |
|
Test Environment : Single Node Controller For following case, DCN is received in normal operational condition and no DCN is received when the node gets restarted (ie. restoration is completed). Checked if the registration for DCN happened consistently during restart and registration seems to have no issues Normal Behavior Path Scope Data Change DCN Received Restart Behavior Path Scope Data Change DCN Received When normal DCN behavior is different from restart behavior apps which act on DCN behave differently when nodal restart occurs |
| Comments |
| Comment by Muthukumaran Kothandaraman [ 05/Aug/15 ] |
|
Please note that during restart, no entry is created newly in the mentioned path. Instead data is restored from persistent store. The expectation is that the DCN should fire consistently in normal and restart conditions provided path and scope of registration remains the same |
| Comment by Deepthi V V [ 05/Aug/15 ] |
|
Attachment Controller restart test case and logs.txt has been added with description: Controller Restart Test Case, Log And Analysis |
| Comment by Deepthi V V [ 12/Aug/15 ] |
|
The bug is valid even in normal operation condition, where the listener registers late for DCNs and before listener got registered, data is available in the datastore. |
| Comment by Tony Tkacik [ 18/Aug/15 ] |
|
Are you using DataChangeListener interface or DataTreeChangeListener interface? |
| Comment by Deepthi V V [ 18/Aug/15 ] |
|
Tony, we are using DataChangeListener interface. The listener registration is wildcard. The app is interested in all list entries. |
| Comment by Muthukumaran Kothandaraman [ 18/Aug/15 ] |
|
Hi Tony, Is firing DCNs during recovery of Config DS is a requirement irrespective of this bug ? This can introduce issues around scaling. For instance, when a specific subtree recovers a million entries and put back to Config DS, if DCNs are fired and apps take their own time executing their side-effects in DCNs, would that not cause unpredictable delays in recovery ? Regards |
| Comment by Muthukumaran Kothandaraman [ 18/Aug/15 ] |
|
Hi Tony, Already clustered datastore is doing a deferred Listener registration after the recovery is completed. So, ideally, the notifications should not fire at all. As mentioned by Deepthi, we also tried late-registration - ie. populate datastore with few entries and then perform listener registration. In this case too we do not get notifications. So, essentially the question boils-down to the exact expected behavior during recovery so that apps do not follow usage patterns like Such app patterns can lead to troubles during restart. Moreover, playing DCNs during recovery is actually a bad pattern. What are your thoughts ? |
| Comment by Moiz Raja [ 25/Aug/15 ] |
|
final Optional<NormalizedNode<?, ?>> currentState = dataTree.takeSnapshot().readNode(path); else { event = null; }This is what we do when a dcn is registered in CDS. Notice that when we create the initial event we specify the scope as BASE. I am wondering if when this event is processed by the DataBroker it fails to match the event with the binding aware listeners and thus no event is generated on the binding side. I still need to confirm this but I suspect this may be the root cause. |
| Comment by Anil Vishnoi [ 25/Aug/15 ] |
|
In my scenario, i register listener with the data change scope of SUBTREE. I can give it a quick try by changing the scope to BASE and see if it works. |
| Comment by Anil Vishnoi [ 25/Aug/15 ] |
|
I changes the data change scope from SUBTREE to BASE for the listener registration, but no change in the behavior. I also tried with data change scope ONE and same results, No change. |
| Comment by Tom Pantelis [ 26/Aug/15 ] |
|
As I tested, I mimiced Anil's OVSDB registration by adding a "passenger" list to the car test model. The registration is: InstanceIdentifier<Passenger> path = InstanceIdentifier getDataBrokerDependency().registerDataChangeListener( So the last argument is wildcarded for the Passenger list. When I create a car with a passenger the listener gets notified as expected. But on restart it doesn't. The problem is that the read call: Optional<NormalizedNode<?, ?>> currentState = returns absent. The DOM path is /(urn:sal-clustering-it:car?revision=2014-08-18)cars/car-entry ] The first path arg corresponds to the MapNode in the tree and the second represents the wildcarded MapNodeEntry instances. As NormalizedNodes.findNode traverses the path args, when it hits the last path arg, NormalizedNodes.getDirectChild expects a key path arg of type NodeIdentifierWithPredicates for a parent MapNode: else if (node instanceof MapNode && pathArg instanceof However the actual path arg is NodeIdentifier because it's wildcarded. It seems the solution is to add a case for "node instanceof MapNode && pathArg instanceof NodeIdentifier" to return the MapNode. |
| Comment by Tom Pantelis [ 26/Aug/15 ] |
|
In the comments below, I meant to say "The first "passenger" path arg corresponds to the MapNode ..." (In reply to Tom Pantelis from comment #11) ] |
| Comment by Tom Pantelis [ 26/Aug/15 ] |
|
Adding the case as I mentioned previously might easily work if the wildcarded path arg is at the end but would be trickier if in the middle. The binding read interface does not allow wildcarded reads - I tried reading the same path as was used to register and it throw an IllegalArgumentEx. So naturally neither does the DOM API, although it returns absent instead of throwing an ex. Maybe the solution is to read the root and let the ResolveDataChangeEventsTask code sort it out. This may be expensive but we could offload it to the listener actor. Tony - what do you think? |
| Comment by Tony Tkacik [ 27/Aug/15 ] |
|
I would not read root, but rather up to first wildcard and then let ResolveDataChangeEvent to resolve this. |
| Comment by Tom Pantelis [ 27/Aug/15 ] |
|
Will do. (In reply to Tony Tkacik from comment #14) |
| Comment by Tom Pantelis [ 10/Sep/15 ] |
|
stable/lithium patch https://git.opendaylight.org/gerrit/#/c/26548/ |
| Comment by Anil Vishnoi [ 14/Sep/15 ] |
|
testing the fix.. |
| Comment by Anil Vishnoi [ 14/Sep/15 ] |
|
I tested the above patch and it works fine. I can see the Data Change blob when i restart the controller. |
| Comment by Anil Vishnoi [ 14/Sep/15 ] |
|
I changed the state to Verified/Fixed, let me know if i need to change it back to Resolved/Fixed. |