[CONTROLLER-1399] Clustering: DCN varies in normal and restart scenarios Created: 05/Aug/15  Updated: 14/Sep/15  Resolved: 14/Sep/15

Status: Verified
Project: controller
Component/s: clustering
Affects Version/s: Lithium
Fix Version/s: None

Type: Bug
Reporter: Muthukumaran Kothandaraman Assignee: Tom Pantelis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File Controller restart test case and logs.txt    
Issue Links:
Duplicate
is duplicated by CONTROLLER-1410 After controller reboot, applications... Resolved
External issue ID: 4094

 Description   

Test Environment : Single Node Controller

For following case, DCN is received in normal operational condition and no DCN is received when the node gets restarted (ie. restoration is completed).

Checked if the registration for DCN happened consistently during restart and registration seems to have no issues

Normal Behavior

Path Scope Data Change DCN Received
========================================================================
/container/list BASE Add entry to list On per entry basis
/container/list SUBTREE Add entry to list On per entry basis

Restart Behavior

Path Scope Data Change DCN Received
========================================================================
/container/list BASE Add entry to list No DCN event Received
/container/list SUBTREE Add entry to list No DCN event Received

When normal DCN behavior is different from restart behavior apps which act on DCN behave differently when nodal restart occurs



 Comments   
Comment by Muthukumaran Kothandaraman [ 05/Aug/15 ]

Please note that during restart, no entry is created newly in the mentioned path. Instead data is restored from persistent store.

The expectation is that the DCN should fire consistently in normal and restart conditions provided path and scope of registration remains the same

Comment by Deepthi V V [ 05/Aug/15 ]

Attachment Controller restart test case and logs.txt has been added with description: Controller Restart Test Case, Log And Analysis

Comment by Deepthi V V [ 12/Aug/15 ]

The bug is valid even in normal operation condition, where the listener registers late for DCNs and before listener got registered, data is available in the datastore.

Comment by Tony Tkacik [ 18/Aug/15 ]

Are you using DataChangeListener interface or DataTreeChangeListener interface?
Are you registering your listener using wildcards (not specifying keys?)

Comment by Deepthi V V [ 18/Aug/15 ]

Tony, we are using DataChangeListener interface.

The listener registration is wildcard. The app is interested in all list entries.

Comment by Muthukumaran Kothandaraman [ 18/Aug/15 ]

Hi Tony,

Is firing DCNs during recovery of Config DS is a requirement irrespective of this bug ?

This can introduce issues around scaling. For instance, when a specific subtree recovers a million entries and put back to Config DS, if DCNs are fired and apps take their own time executing their side-effects in DCNs, would that not cause unpredictable delays in recovery ?

Regards
Muthu

Comment by Muthukumaran Kothandaraman [ 18/Aug/15 ]

Hi Tony,

Already clustered datastore is doing a deferred Listener registration after the recovery is completed. So, ideally, the notifications should not fire at all.

As mentioned by Deepthi, we also tried late-registration - ie. populate datastore with few entries and then perform listener registration. In this case too we do not get notifications.

So, essentially the question boils-down to the exact expected behavior during recovery so that apps do not follow usage patterns like
transact with Config DS -> get DCN -> transact with Operational DS.

Such app patterns can lead to troubles during restart.

Moreover, playing DCNs during recovery is actually a bad pattern.

What are your thoughts ?

Comment by Moiz Raja [ 25/Aug/15 ]

final Optional<NormalizedNode<?, ?>> currentState = dataTree.takeSnapshot().readNode(path);
final DOMImmutableDataChangeEvent event;
if (currentState.isPresent())

{ final NormalizedNode<?, ?> data = currentState.get(); event = DOMImmutableDataChangeEvent.builder(DataChangeScope.BASE).setAfter(data).addCreated(path, data).build(); }

else

{ event = null; }

This is what we do when a dcn is registered in CDS. Notice that when we create the initial event we specify the scope as BASE. I am wondering if when this event is processed by the DataBroker it fails to match the event with the binding aware listeners and thus no event is generated on the binding side. I still need to confirm this but I suspect this may be the root cause.

Comment by Anil Vishnoi [ 25/Aug/15 ]

In my scenario, i register listener with the data change scope of SUBTREE. I can give it a quick try by changing the scope to BASE and see if it works.

Comment by Anil Vishnoi [ 25/Aug/15 ]

I changes the data change scope from SUBTREE to BASE for the listener registration, but no change in the behavior. I also tried with data change scope ONE and same results, No change.

Comment by Tom Pantelis [ 26/Aug/15 ]

As I tested, I mimiced Anil's OVSDB registration by adding a "passenger" list to the car test model. The registration is:

InstanceIdentifier<Passenger> path = InstanceIdentifier
.create(Cars.class)
.child(CarEntry.class, new CarEntryKey(new CarId("car1")))
.child(Passenger.class);

getDataBrokerDependency().registerDataChangeListener(
LogicalDatastoreType.CONFIGURATION, path, new CarListener(),
DataChangeScope.SUBTREE);

So the last argument is wildcarded for the Passenger list.

When I create a car with a passenger the listener gets notified as expected. But on restart it doesn't. The problem is that the read call:

Optional<NormalizedNode<?, ?>> currentState =
dataTree.takeSnapshot().readNode(path);

returns absent.

The DOM path is

/(urn:sal-clustering-it:car?revision=2014-08-18)cars/car-entry
/car-entry[

{(urn:sal-clustering-it:car?revision=2014-08-18)id=car1}

]
/passenger/passenger

The first path arg corresponds to the MapNode in the tree and the second represents the wildcarded MapNodeEntry instances. As NormalizedNodes.findNode traverses the path args, when it hits the last path arg, NormalizedNodes.getDirectChild expects a key path arg of type NodeIdentifierWithPredicates for a parent MapNode:

else if (node instanceof MapNode && pathArg instanceof
NodeIdentifierWithPredicates) {
return (Optional) ((MapNode) node)
.getChild((NodeIdentifierWithPredicates) pathArg);

However the actual path arg is NodeIdentifier because it's wildcarded.

It seems the solution is to add a case for "node instanceof MapNode && pathArg instanceof NodeIdentifier" to return the MapNode.

Comment by Tom Pantelis [ 26/Aug/15 ]

In the comments below, I meant to say "The first "passenger" path arg corresponds to the MapNode ..."

(In reply to Tom Pantelis from comment #11)
> As I tested, I mimiced Anil's OVSDB registration by adding a "passenger"
> list to the car test model. The registration is:
>
> InstanceIdentifier<Passenger> path = InstanceIdentifier
> .create(Cars.class)
> .child(CarEntry.class, new CarEntryKey(new CarId("car1")))
> .child(Passenger.class);
>
> getDataBrokerDependency().registerDataChangeListener(
> LogicalDatastoreType.CONFIGURATION, path, new CarListener(),
> DataChangeScope.SUBTREE);
>
> So the last argument is wildcarded for the Passenger list.
>
> When I create a car with a passenger the listener gets notified as expected.
> But on restart it doesn't. The problem is that the read call:
>
> Optional<NormalizedNode<?, ?>> currentState =
> dataTree.takeSnapshot().readNode(path);
>
> returns absent.
>
> The DOM path is
>
> /(urn:sal-clustering-it:car?revision=2014-08-18)cars/car-entry
> /car-entry[

{(urn:sal-clustering-it:car?revision=2014-08-18)id=car1}

]
> /passenger/passenger
>
> The first path arg corresponds to the MapNode in the tree and the second
> represents the wildcarded MapNodeEntry instances. As
> NormalizedNodes.findNode traverses the path args, when it hits the last path
> arg, NormalizedNodes.getDirectChild expects a key path arg of type
> NodeIdentifierWithPredicates for a parent MapNode:
>
> else if (node instanceof MapNode && pathArg instanceof
> NodeIdentifierWithPredicates) {
> return (Optional) ((MapNode) node)
> .getChild((NodeIdentifierWithPredicates) pathArg);
>
> However the actual path arg is NodeIdentifier because it's wildcarded.
>
> It seems the solution is to add a case for "node instanceof MapNode &&
> pathArg instanceof NodeIdentifier" to return the MapNode.

Comment by Tom Pantelis [ 26/Aug/15 ]

Adding the case as I mentioned previously might easily work if the wildcarded path arg is at the end but would be trickier if in the middle.

The binding read interface does not allow wildcarded reads - I tried reading the same path as was used to register and it throw an IllegalArgumentEx. So naturally neither does the DOM API, although it returns absent instead of throwing an ex.

Maybe the solution is to read the root and let the ResolveDataChangeEventsTask code sort it out. This may be expensive but we could offload it to the listener actor.

Tony - what do you think?

Comment by Tony Tkacik [ 27/Aug/15 ]

I would not read root, but rather up to first wildcard and then let ResolveDataChangeEvent to resolve this.

Comment by Tom Pantelis [ 27/Aug/15 ]

Will do.

(In reply to Tony Tkacik from comment #14)
> I would not read root, but rather up to first wildcard and then let
> ResolveDataChangeEvent to resolve this.

Comment by Tom Pantelis [ 10/Sep/15 ]

stable/lithium patch https://git.opendaylight.org/gerrit/#/c/26548/

Comment by Anil Vishnoi [ 14/Sep/15 ]

testing the fix..

Comment by Anil Vishnoi [ 14/Sep/15 ]

I tested the above patch and it works fine. I can see the Data Change blob when i restart the controller.

Comment by Anil Vishnoi [ 14/Sep/15 ]

I changed the state to Verified/Fixed, let me know if i need to change it back to Resolved/Fixed.

Generated at Wed Feb 07 19:55:27 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.