[OPNFLWPLUG-423] StatisticsManager misses nodeadded and noderemoved operations when exceptions occur Created: 29/Apr/15  Updated: 27/Sep/21  Resolved: 02/Jun/15

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Kamal Rameshan Assignee: Kamal Rameshan
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 3085

 Description   

Currently the nodeadded and node removed operations are batched along with the stat notification operations.
On exception in the tx chain, all the operations in teh queue are discarded.

This design is applicable to the stats but is not suited for node added and node removed.

Node added and node removed should be applied directly and should not be queued.

Issue is mostly seen with 250+ switches on a 3 node cluster.

This might solve many issues with related to node added and node removed being submitted.



 Comments   
Comment by Kamal Rameshan [ 30/Apr/15 ]

https://git.opendaylight.org/gerrit/#/c/19360/

Comment by Vaclav Demcak [ 04/May/15 ]

Hi Kamal,

you have right, a tx chain fail could bring a lot of problems. But a queue is holding the operation ordering in general. So please try to thing about next scenario (e.g. device are quickly connect/disconnect, so you are able to add bad statistics from disconnected device to new connected device - because we lost the ordering).

So could we submit add/remove node as own tx commit ?
I mean, we could submit tx before every add/remove node operation, wait for result and submit every add/remove like separate submit. But we have to take out all add/remove node operation in cleanDataStoreOperQueue method and we have to try to send it again in same order.

Comment by Kamal Rameshan [ 04/May/15 ]

Hi Vaclav,

I guess writing of stale-stat operations is an issue we still have, since inventory-manager processes the node-removed instantly and stats-manager processes it later. So for that brief period, we do see stale stats.

I see it as below: we have 2 managers writing to operational. Inv and stats. And i feel there should not be a delay in both of these managers processing the nodeadded and node-removed. As a delay causes problems.
How to handle the stale stat operations, is a design issue.
May be we can generate a uid when a node gets added to the stats and we can associate that uid to the stat-ds-operation and feed it to the queue. On removal if the uid is not the one in the nodecollector map, it is a stale operation to be ignored. There can be many design solutions to figure out the stale operations.

In short, i dont see a value add in queueing the node-added and removed as part of the other non-priority stat operations.

Comment by Kamal Rameshan [ 04/May/15 ]

Hi Vaclav,

I have added a simple fix to mark stat operations via UUIDs to ignore stale operations.

https://git.opendaylight.org/gerrit/#/c/19360/

Let me know if this addresses the issue.

Thanks
Kamal

Generated at Wed Feb 07 20:32:26 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.