[CONTROLLER-651] Multi-thread parts of WriteTx commit process Created: 26/Jul/14  Updated: 18/Aug/14  Resolved: 18/Aug/14

Status: Resolved
Project: controller
Component/s: mdsal
Affects Version/s: Helium
Fix Version/s: None

Type: Improvement
Reporter: Tom Pantelis Assignee: Tom Pantelis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All



 Description   

Currently the WriteTx commit process is all single-theaded. At scale this may be a bottleneck. We should try to off-load and multi-thread at least parts of the commit so as to avoid blocking the commit thread and also gain some parallelism.

One part is the notification of DataChangeListeners. Currently the ThreePhaseCommitImpl calls DataChangeListeners on an executor that executes tasks in the same thread as the caller so DataChangeListeners block the commit thread. This was done because DataChangeListeners need to be notified of change events in the order that they occur.

However we can queue events per DataChangeListener and dispatch them serially on separate threads. I have prototyped a QueuedNotificationManager to do this. In addition, the QueuedNotificationManager optimizes its memory footprint by only allocating and maintaining a queue and executor task for a listener when there are pending notifications. Once all notifications have been dispatched, the queue and task are discarded.

Another part of the commit process that can be off-loaded is the notification of the ListenableFuture that's returned from the submit call. Typically, clients will use Futures#addCallBack so as not to block on the ListenableFuture. The default behavior of Futures#addCallBack is to use MoreExecutors#sameThreadExecutor which results in the client callback running on the commit thread. While it is recommended that clients not do any expensive processing and/or block, I think it's fragile to rely on that. One scenario: if a client does a commit in an RPC call and sets the RPC result Future in the commit callback, if the caller of the RPC also uses Futures#addCallBack with MoreExecutors#sameThreadExecutor, then it will also run in the commit thread and so on.

To off-load commit ListenableFuture callbacks, I have prototyped an AsyncNotifyingListeningExecutorService class that uses an AsyncNotifyingListenableFutureTask that takes an Executor and notifies registered ListenableFuture Runnables on the Executor when the task completes and the future result is set. ListenableFuture#addListener also takes an Executor which, by default, is MoreExecutors#sameThreadExecutor. This Executor is still used but if the AsyncNotifyingListenableFutureTask detects that the listener Runnable task will run on the same thread that completed the AsyncNotifyingListenableFutureTask, then it is off-loaded to the other Executor.



 Comments   
Comment by Robert Varga [ 31/Jul/14 ]

I'd like to understand the threading model of QueuedNotificationManager – is it a thread per listener? The description is very vague in this regard.

I did not quite get mechanics of the Future notification part. Can you describe it somewhere in the wiki?

Comment by Tom Pantelis [ 31/Jul/14 ]

(In reply to Robert Varga from comment #1)
> I'd like to understand the threading model of QueuedNotificationManager –
> is it a thread per listener? The description is very vague in this regard.

I have pushed https://git.opendaylight.org/gerrit/#/c/9305/ with the new concurrent classes.

There is a thread per listener but it's dynamic to conserve resources.

Here's the sequence:

1) The commit thread creates a DataChangeEvent and and summits it to the QueuedNotificationManager (QNM).

2) The QNM contains a map of listener -> queue. The QNM looks up in the map. There's no entry yet for the listener so it creates a queue containing the event and submits a task to an executor.

3) The executor task runs and loops and dispatches events until the queue is empty.

4) Once he queue is empty, the task is done so it removes the listener queue entry from the listener map and exits.

5) The next time an event occurs for the listener, repeat 2) - 4), i.e. a new queue is created and a new task submitted.

If another event occurs during 3), the QNM will find an existing queue for the listener in the map and it just offers the event.

So in this manner, there is a separate queue and thread for each listener and events are dispatched serially to each listener and concurrently wrt other listeners. When there no events to dispatch to a listener, eventually it's queue, task and thread are cleaned up.

>
> I did not quite get mechanics of the Future notification part. Can you
> describe it somewhere in the wiki?

Which wiki are you referring (there's like thousands)?

It's probably easier to discuss this on a meeting or hangout. Email me when you're available.

Comment by Tom Pantelis [ 31/Jul/14 ]

Re: the Future notification, the DOMDataCommitCoordinatorImpl uses a ListeningExecutorService backed by a single-threaded executor. When #submit is called it submits a Callable task which returns a ListenableFuture back to the client.

Clients will typically used Futures#addCallback to be notified asynchronously when the Future is complete (at least that's what's recommended). Futures#addCallback takes an optional Executor on which to invoke the listener's callback. This defaults to MoreExecutors#sameThreadExecutor and is what clients will commonly use.

The Callable task is wrapped by an internal Runnable by the executor. When the task executes, the Runnable calls the Callable and, when complete, it sets the Future result (or exception). The derived ListenableFuture then submits its client listener callback task(s) on their respective Executors. If a client's Executor is one that runs in the same thread, then it will run on the single commit thread and will block it until it returns. This is the part I'm proposing to alleviate.

As I outlined in the bug desc, I think's it's fragile to rely on client callbacks not doing any expensive processing and/or block. Also, we lose parallelism if it's single-threaded.

So I prototyped an AsyncNotifyingListeningExecutorService class that will use a specified Executor to off-load client ListenableFuture callbacks if the Executor the client registered with will run in the same thread on which the Future was completed.

Comment by Tom Pantelis [ 31/Jul/14 ]

In addition, the QueueNotificationMgr and AsyncNotifyingListeningExecutorService will use executors with bounded queues and with a RejectedExecutionHandler that blocks if the queue is full. I've been debating which technique to use for blocking, either the CallerRunsPolicy RejectedExecutionHandler or one that does a blocking put to the queue. I'm favoring CallerRunsPolicy because it allows the caller (i.e. single commit thread) to eventually continue although degradedly (assuming the notification task competes).

The single-threaded commit executor's queue is currently unbounded. Since the commit thread could still block on listener notifications, we should bound the queue (say capacity 5000). Then the question is what do we do if the commit executor queue gets full? We can't use CallerRunsPolicy because that would allow for out-of-order commits. We could use a RejectedExecutionHandler that does a blocking put to the queue to help relieve the back pressure but then what happens to the caller of that caller and so on.
I think at some point somebody has to give up.

In discussing with Colin, we think it's reasonable to just fail the commit and let the caller deal with that. That case should be extreme - if we get to the point where that many commits are backed up there's something seriously wrong, probably deadlock or endless loop somewhere.

Comment by Robert Varga [ 31/Jul/14 ]

I agree on the general mechanics as a first step.

We cannot use the CallerRuns policy for event notifications because that would violate event delivery ordering and "logically single-threaded" contract of DataChangeListener.

The correct way of doing things here is to use RejectedExecutionHandler on the queue and reserve an entry during pre-commit phase. That way the commit process will fail predictably and leave the datastore in a consistent state if we are running low.

As an incremental (and maybe optional, this is API contract question) improvement, we can maintain a queue-per-listener (with multiple queues serviced by a single thread). If we are not required to maintain event boundaries, we can perform very cheap (== cost of building a transaction payload) state compression on egress. That will have the effect of coalescing modifications into larger transactions, thus lowering the transactional activity without impairing throughput, as you essentially maintain a single entry.

Comment by Tom Pantelis [ 31/Jul/14 ]

(In reply to Robert Varga from comment #5)
> I agree on the general mechanics as a first step.
>
> We cannot use the CallerRuns policy for event notifications because that
> would violate event delivery ordering and "logically single-threaded"
> contract of DataChangeListener.
>

There's actually 2 queues in question here, one for the executor tasks and the other per listener. If the QNM calls the executor to submit a notification task, that means there either isn't a current notification task running and dispatching events for that listener or there is a current notification task but it has finished dispatching and is in the process of exiting. For the latter, there is a small window between the time it sees an empty queue and is breaking out of its loop and before it removes itself from the cache prior to exiting. Either way it means it's starting a new task for the listener and all previous events have been dispatched. So I think CallerRunsPolicy is valid here. Please take a look at the code I submitted.

> The correct way of doing things here is to use RejectedExecutionHandler on
> the queue and reserve an entry during pre-commit phase. That way the commit
> process will fail predictably and leave the datastore in a consistent state
> if we are running low.
>

Which queue are you referring to? The single-threaded commit queue? If so, I'm not sure what you mean by "reserve an entry during pre-commit phase". The way I implemented commit executor failure is by catching the RejectedExecutionException from the submit in DOMDataCommitCoordinatorImpl and returning a failed Future with TransactionCommitFailedException. So that writeTx is dropped and not even attempted to be committed. That shouldn't affect the datastore in any way unless I'm missing something. The controller code is in a draft: https://git.opendaylight.org/gerrit/#/c/9422/.

> As an incremental (and maybe optional, this is API contract question)
> improvement, we can maintain a queue-per-listener (with multiple queues
> serviced by a single thread). If we are not required to maintain event
> boundaries, we can perform very cheap (== cost of building a transaction
> payload) state compression on egress. That will have the effect of
> coalescing modifications into larger transactions, thus lowering the
> transactional activity without impairing throughput, as you essentially
> maintain a single entry.

Comment by Tom Pantelis [ 13/Aug/14 ]

The functional changes have been merged.

https://git.opendaylight.org/gerrit/#/c/9905/ is a follow-up patch to convert to use the config system instead of system properties for executor config params.

Comment by Tom Pantelis [ 18/Aug/14 ]

Follow-up patch https://git.opendaylight.org/gerrit/#/c/9905/ has been merged.

Generated at Wed Feb 07 19:53:33 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.