[CONTROLLER-1261] Clustering: Throttling in TransactionProxy does not work well with the batching Created: 19/Apr/15  Updated: 11/Jun/15  Resolved: 11/Jun/15

Status: Resolved
Project: controller
Component/s: mdsal
Affects Version/s: Post-Helium
Fix Version/s: None

Type: Bug
Reporter: Tom Pantelis Assignee: Moiz Raja
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 3019
Priority: Normal

 Description   

The TransactionProxy uses a semaphore to employ throttling back-pressure for operations. The semaphore is initialized with the mailbox limit (500) defined in akka.conf, even though the shard and transaction actor mailboxes utilize an unbounded queue. On each write, merge. delete. read call, a permit is attempted to be acquired from the semaphore (decrements the permit count). If a permit cannot be acquired, it times out after 5 sec but still proceeds.

When initially implemented, it essentially limited the number of outstanding modification messages in flight as the TransactionContextImpl sent a message per modification. On message response, the semaphore permit was released.

However, TransactionContextImpl now employs batching of modification operations and, on batched message response, it releases semaphore permits equal to the batch count. Since a permit is acquired for each operation up front, this can lead to interesting behavior if the batch count exceeds the initial semaphore count. Eg, with the initial permits at 500 and the batch count at 1000, after 500 modifications, the permits will be used up however no batched message has be sent yet. The next operation won't be able to acquire a permit and will time out after 5 sec. The same will happen for the next operation and so forth until it hits the batch count.

When setting the initial permits, we need to take into account the batch count. Also we shouldn't use the mailbox limit as it doesn't apply. It seems we should multiply the batch count by the max allowed number of inflight batches.



 Comments   
Comment by Moiz Raja [ 10/Jun/15 ]

https://git.opendaylight.org/gerrit/22244

Generated at Wed Feb 07 19:55:05 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.