[CONTROLLER-1593] C: delete ds benchmark regression Created: 07/Feb/17 Updated: 25/Jul/23 Resolved: 21/Apr/17 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Peter Gubka | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| External issue ID: | 7747 |
| Description |
|
Based on the graph https://jenkins.opendaylight.org/releng/view/controller/job/controller-csit-3node-periodic-benchmark-all-carbon/plot/getPlot?index=6&width=900&height=900 there is regression in DELETE operation. The comparison is done for builds #130 and #170 The log files canbe taken from the links above. The regression happened only for config datastore (both leader and follower), operational was not affected. |
| Comments |
| Comment by Peter Gubka [ 07/Feb/17 ] |
|
It may be related to an older issue https://bugs.opendaylight.org/show_bug.cgi?id=7390 |
| Comment by Peter Gubka [ 07/Feb/17 ] |
|
Affected is only config ds for both leader and follower. Times are in miliseconds and are taken from build #130 and #170 #130 #170 |
| Comment by Robert Varga [ 13/Apr/17 ] |
|
The graph shows around 6x regression in delete performance on January 9th. This is not reflected in Boron, so it is caused by something we did in Carbon around that time frame. Since this is config data store, there are two possible causes:
I did not find anything in yangtools. There are two candidates in controller:
Is there a way to dig narrow the window when the regression occurred? |
| Comment by Robert Varga [ 20/Apr/17 ] |
|
So far the best candidate for causing the regression is https://git.opendaylight.org/gerrit/49404. |
| Comment by Robert Varga [ 20/Apr/17 ] |
|
That is based on the fact that the regression occurred somewhere before builds #133 and #143. The patch introduced a change in behaviour – we are now performing DataTree operations and invoking listeners on the follower before responding to the shard leader, which I think is what limits delete performance here as AppendEntry consensus is built up more slowly than before and is actually affected by listener code execution – notably by DataChangeListener, which are not off-loaded and are expensive to compute. |
| Comment by Tom Pantelis [ 20/Apr/17 ] |
|
(In reply to Robert Varga from comment #5) That patch fixed an inconsistency issue between lastAppliedIndex and what is actually applied to the data tree for snapshots. ApplyState actually occurs after consensus as the leader has to first get consensus and update the commitIndex before updating lastAppliedIndex. However an AppendEntries could append new entry/entries for consensus and also apply previous entries. In the end, the state will still eventually apply and block subsequent AppendEntries but replying before it applies the previous state, as you mentioned, should get us back to the previous behavior and allow the leader to proceed with consensus faster. DataChangeListener notification generation is offloaded on a separate actor but there is a 10 ms mailbox-push-timeout-time which can block the shard actor. But if we're hitting that then we're reaching the 5000 mailbox-capacity and thus are in danger of dropping notifications. They would go to dead letters - hopefully that's not occurring. |
| Comment by Robert Varga [ 20/Apr/17 ] |
|
Do both DCL and DTCL have dedicated actors? Also, for DCL determining whom to notify (i.e. while the registration tree is locked) is actually expensive, if I remember correctly... |
| Comment by Tom Pantelis [ 20/Apr/17 ] |
| Comment by Tom Pantelis [ 20/Apr/17 ] |
|
(In reply to Robert Varga from comment #7) yes - that's all done on a separate actor. Also the registration tree lock previously could block the shard on listener registration but that was recently changed to add new listeners on the actor as well. |
| Comment by Robert Varga [ 20/Apr/17 ] |
|
I guess the question is when do we consider the entry as applied. There are essentially five choices: 1) when we have validated the AppendEntries Previously we have done 1), currently we are doing 4), your patch reverts that to 1) (I think). I think we want to go at least as far as 3), as at that point our internal journal records match what we have told the leader and DataTree actually reflects those changes. I wonder what will happen if we send the response to the leader and we fail due to an exception before we finish processing the message... |
| Comment by Robert Varga [ 20/Apr/17 ] |
|
Ah, right, both notifications are offloaded since that recent patch... so I guess we'll give it a day or two to see if the performance has changed |
| Comment by Tom Pantelis [ 20/Apr/17 ] |
|
(In reply to Robert Varga from comment #10) My new patch essentially reverts it back to the behavior prior to https://git.opendaylight.org/gerrit/49404, in terms of when the reply is sent wrt to applying previous state. Prior, handleAppendEntries updated lastAppliedIndex and queued an ApplyState message to apply to the data tree, then replied to the leader. So handleAppendEntries always returned before applying the previous state. However this resulted in an inconsistency in the persisted state between lastAppliedIndex and the actual applied state in the snapshot if one was triggered immediately after but before ApplyState was processed. Patch 49404 fixed that by making the 2 atomic. This new patch merely reverts it back to the behavior of handleAppendEntries replying before the previous state is applied. > As far as the leader is concerned when sending new entries, it's only interested in as far as 2). Remember, normally it takes 2 AppendEntries to get an entry applied to the state. The first is sent with the new entry to append to the journal and the second is sent with the updated commit index after the first one responds and consensus is reached. The second AppendEntries may contain subsequent new entries and prior ones may be applied to the state as well as a side effect but the leader is not concerned with the latter and we don't need to block the reply. > It doesn't matter - either way we respond to the leader. We don't report apply state failures to the leader. The leader is only concerned that entries were appended to the journal and persisted. Raft doesn't define apply state failures wrt the leader. |
| Comment by Tomas Cere [ 21/Apr/17 ] |
|
Tested locally, delete perf seems to be back on boron numbers. |