[CONTROLLER-1566] The mailbox of actor for data tree change listener should use limited-size queue Created: 07/Dec/16  Updated: 19/Dec/16  Resolved: 19/Dec/16

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: 0.4.0
Fix Version/s: None

Type: Bug
Reporter: Kangqian Yin Assignee: Kangqian Yin
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: PNG File limited-length-mailbox-queue-not-handled.png    
External issue ID: 7314

 Description   

We'd met a problem of using up JVM heap while running OpenDaylight controller. The reason was that the consuming rate of data tree change listeners was much slower than the producing rate of shard data trees. At the same time, the mailbox of actor for data tree change listener uses unlimited-size queue. So the data tree change messages had cumulated infinitely until using up the JVM heap.

After we limited the size of the mailbox queue, the JVM won't run into the mire of fully garbage-collecting of a used-up heap and kept stable no matter how slowly the data tree change listeners were consuming their data tree change messages.

Additionally, we added a new metric for the data tree change listener actor to track the number of dropped messages due to the limited-size queue. Attachment is the screenshot of an example of such metric.



 Comments   
Comment by Kangqian Yin [ 07/Dec/16 ]

Attachment limited-length-mailbox-queue-not-handled.png has been added with description: New metric of "not-handled" for actors with mailbox type of "bounded-mailbox"

Comment by Vratko Polak [ 09/Dec/16 ]

> After we limited the size of the mailbox queue

This sounds like you already have a fix ready. If you are willing to get this fix merged:
0. Click "take" in "Assigned To" field of this Bug and set "Status" to "in progress" to let other people know this Bug has a person dedicated to fixing it.
1. Upload your patch to Gerrit and add Controller committers as reviewers.
2. Post a comment here with the Gerrit link and set "Status" to "waiting for review".
3. Work with reviewers to get you patch merged, then set status to "fixed".

There are more verbose guides in various subpages of https://wiki.opendaylight.org/view/BestPractices

Comment by Tom Pantelis [ 13/Dec/16 ]

The problem with using a bounded mailbox is that change notifications will get dropped which likely isn't good either. You really should look into why the data tree change listener isn't keeping up and try to alleviate that (e.g. is it doing blocking operation(s) that could be done async?). Also maybe the front-end client(s) are producing transactions too fast.

Comment by Vratko Polak [ 13/Dec/16 ]

> dropped messages due to the limited-size queue

Oh, I missed that. No application expects data tree changes to be dropped silently.

Would it be possible to block the committing thread (instead of dropping message) until queue is writable? Or would that lead to deadlocks (or some other queue exploding)?

Comment by Tom Pantelis [ 13/Dec/16 ]

(In reply to Vratko Polák from comment #3)
> > dropped messages due to the limited-size queue
>
> Oh, I missed that. No application expects data tree changes to be dropped
> silently.
>
> Would it be possible to block the committing thread (instead of dropping
> message) until queue is writable? Or would that lead to deadlocks (or some
> other queue exploding)?

Back pressure is the ideal way to do it but in this case the committing thread would block the shard from processing subsequent messages which isn't ideal. One slow listener could block raft heartbeats and subsequent transactions.

If a listener may do expensive processing or block I think it should offload that from the listener notification thread. If the queue or thread pool gets inundated then the listener logic can decide how best to handle that.

Comment by Kangqian Yin [ 14/Dec/16 ]

Just as you said, It's unacceptable to drop data tree change messages silenty when the message queue is fulfilled. So I would add a metric to record the counter of dropped messages by overriding the class MeteredBoundedMailbox.MeteredMessageQueue. The application can monitor this metric to see whether their listners works slowly or loses some critical data tree change messages.

And by means of this metric, we have found our bgp listeners consume data tree changes slowly than the production from the processing of bgp packets. We're now tring to offload the changes to a new queue and consolidate the changes.

The disadvantage of limiting the queue size I've met is that it will block the shard actor for a duration of timeout when the shard tries to tell a data tree change message to a fulfilled message queue. That'll decrease throughput of odl's distributed datastore. However this is a way of back-pressure to the producers.

Comment by Robert Varga [ 16/Dec/16 ]

This is a more complicated problem. The first it the transport part and is the inverse of BUG-5280, i.e. solution can use the same mechanism.

As for the backpressure problem, we cannot allow the system to grind to a halt simply because a bug in some obscure application. To that effect we have introduced DOMDataTreeListener, which has two things going for it:

  • proper error callback, i.e. "you cannot keep up, I killed you, restart if you wish"
  • ability to allow notifications to be state-compressed, hence doing a similar thing PingPongDataBroker does

These three combined can be used to:

  • not overflow actor inbox (by exerting backpressure)
  • keep the number of notifications low (state-compress them when they start piling up)
  • if all else fails kill the listener
Comment by Robert Varga [ 16/Dec/16 ]

A further note: Data

{Tree}

ChangeListeners API contract does not allow a message to be dropped, as that amounts to data corruption.

Comment by Kangqian Yin [ 17/Dec/16 ]

You're right. Data corruption in application is unavoidable if the application consumes messages more slowly than the message producing from the distributed datastore.

After we limited the listener queue length, the bgp listeners cannot work correctly for they lost many data.

However, this will force such applications as bgp listeners to find the way to catch up with the producer. This's a good thing for it will improve the quality and throughput of whole system.

Regarding the complexity of applications, I don't think the distributed datastore can achieve the DataTreeChangeListener contract of no message lost or no data corruption.

The reasonable thing the distributed datastore can do on this issue is to keep more messages as possible as the JVM heap can allow rather than a fixed number of messages.

Comment by Kangqian Yin [ 17/Dec/16 ]

https://git.opendaylight.org/gerrit/#/c/49518/

Comment by Robert Varga [ 17/Dec/16 ]

Pushing the problem out to applications is unacceptable, simply because we do not give them the tools to solve the problem and it will lead to massive amounts of code duplication in the application layer, with the various workarounds having different kinds of bugs.

To illustrate on the specific example of BGP listeners, how do you propose they solve the problem, when they are completely unaware of the fact that the implementation has chosen to break the DTCN stream and hence the sum of observed state no longer matches what is in the data store?

I do not agree with your assessment of not being unable to fulfill a reasonable interface contract (where DOMDataTreeListener is reasonable). I believe we have all the tools we need to get the job done.

Is your assessment based on the CDS architecture or its implementation deficiencies?

Comment by Kangqian Yin [ 19/Dec/16 ]

Hi, Robert, What's CDS architecture? My assessment of bgp listeners comes from the performance testing of our bgp application customized from ODL's bgpcep project.

Currently, I don't have time and ability to implement the much better solution as you said in "Comment 6".

However, I look forward to seeing you or Tom can implement that solution to avoid using up JVM heap.

Before we can use that solution, I have to limit the queue size because using up JVM heap is a much more dangerous thing. We've suffered it once, and we won't suffer it anymore.

Generated at Wed Feb 07 19:55:52 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.