[CONTROLLER-263] MDSAL Notification pool threads not getting spawned besides the core threads Created: 01/Apr/14  Updated: 25/Jul/23  Resolved: 23/Apr/14

Status: Resolved
Project: controller
Component/s: mdsal
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Kamal Rameshan Assignee: Kamal Rameshan
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Mac OS
Platform: Macintosh


External issue ID: 645
Priority: Normal

 Description   

MDSAL Default Notification pool is currently configured as follows:
Core threads: 4
Max threads : 32
Queue: Unbounded Linkedblockingqueue

During cbench tests, its observed the # of threads working on the queue, dont ever go beyond 4.
It should ideally hit 32, as the notification Q is not getting drained at a faster rate.



 Comments   
Comment by Kamal Rameshan [ 01/Apr/14 ]

The threadpoolexecutor does not create new threads if the queue is not full.

Since the notification queue is unbounded, the above condition is never met and hence no new threads are spawned (besides the 4 core ones)

Comment by Robert Varga [ 02/Apr/14 ]

I am not sure whether having a unbounded queue is the correct behavior here.

Comment by Robert Varga [ 02/Apr/14 ]

Sorry for incomplete comment.

While I see the problem here, and the fix makes for some better throughput, I am afraid we are not introducing a backpressure mechanism and at some point we'll see producers overwhelm consumers, creating a huge backlog.

I think we need a more full solution, probably create a per-producer queue and some sort of fair queue dispatch threadpool, which would:

1) prevent a single producer from monopolizing the notification dispatch threadpool
2) apply backpressure to heavy producers, such that we don't end up with huge backlogs

I am thinking something along the lines of token buckets.

Tony: where would be a good place to place such a rate-limiting mechanism?

Kamal: can you introduce a knob, which would place a limit on the queue? ThreadPoolExecutor.CallerRunsPolicy looks like a good first cut at the rejection policy in that case.

Comment by Kamal Rameshan [ 02/Apr/14 ]

The patch is to fix a bug in the current design.

I am in total agreement that an unbounded queue is not the approach we want to take as an OOM error is not far off.

Rather than creating a queue/producer, my take would be to put a RateLimiter(com.google.common.util.concurrent) for each producer.
And the queue could be a bounded, possibly with a higher number.

What would be the throttling logic is something we need to come up with.

I can take up this as an enhancement, if we agree on the design.

But for now, i think we should have a fix for the threadpool. Currently its broken.

Comment by Kamal Rameshan [ 03/Apr/14 ]

Working on a patch to implement CallerRunsPolicy or any other way to throttle.

Will run some cbench tests to see if it impacts significantly.

Comment by Kamal Rameshan [ 07/Apr/14 ]

CallerRunsPolicy implementation is causing thread boundaries to be violated and also having performance issues.

Will raise a separate bug as 'enhancement' to introduce a throttle as Robert suggested.

For now just fixing the patch for the service release:

https://git.opendaylight.org/gerrit/#/c/5857/

Generated at Wed Feb 07 19:52:35 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.