[CONTROLLER-1593] C: delete ds benchmark regression Created: 07/Feb/17  Updated: 25/Jul/23  Resolved: 21/Apr/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Peter Gubka Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 7747

 Description   

Based on the graph https://jenkins.opendaylight.org/releng/view/controller/job/controller-csit-3node-periodic-benchmark-all-carbon/plot/getPlot?index=6&width=900&height=900 there is regression in DELETE operation.

The comparison is done for builds #130 and #170
https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-periodic-benchmark-only-carbon/130/archives/
https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-periodic-benchmark-only-carbon/170/archives/

The log files canbe taken from the links above.
Log files odl1_karaf.log.gz and odl2_karaf.log.gz are huge after unzipping and are full of lines like
2017-02-02 12:54:11,425 | WARN | tp1150926568-510 | SimpletxDomRead | 294 - org.opendaylight.controller.dsbenchmark - 1.3.0.SNAPSHOT | optionalDataObject is either null or .isPresent is false

The regression happened only for config datastore (both leader and follower), operational was not affected.



 Comments   
Comment by Peter Gubka [ 07/Feb/17 ]

It may be related to an older issue https://bugs.opendaylight.org/show_bug.cgi?id=7390

Comment by Peter Gubka [ 07/Feb/17 ]

Affected is only config ds for both leader and follower. Times are in miliseconds and are taken from build #130 and #170

#130 #170
61.082625 977.885625 FOL_CONFIG-BINDING-AWARE-SIMPLE-TX-DELETE-1/100000OUTER/INNER-1OP-EXEC
47.062875 935.484125 FOL_CONFIG-BINDING-AWARE-TX-CHAINING-DELETE-1/100000OUTER/INNER-1OP-EXEC
33.681125 979.43 FOL_CONFIG-BINDING-INDEPENDENT-SIMPLE-TX-DELETE-1/100000OUTER/INNER-1OP-EXEC
41.708875 977.866625 FOL_CONFIG-BINDING-INDEPENDENT-TX-CHAINING-DELETE-1/100000OUTER/INNER-1OP-EXEC
86.716 703.5745 CONFIG-BINDING-AWARE-SIMPLE-TX-DELETE-1/100000OUTER/INNER-1OP-EXEC
92.4615 707.398375 CONFIG-BINDING-AWARE-TX-CHAINING-DELETE-1/100000OUTER/INNER-1OP-EXEC
84.083875 676.697125 CONFIG-BINDING-INDEPENDENT-SIMPLE-TX-DELETE-1/100000OUTER/INNER-1OP-EXEC
92.229 700.476 CONFIG-BINDING-INDEPENDENT-TX-CHAINING-DELETE-1/100000OUTER/INNER-1OP-EXEC
8.877125 2.464 FOL_OPERATIONAL-BINDING-AWARE-SIMPLE-TX-DELETE-1/100000OUTER/INNER-1OP-EXEC
8.02175 4.815375 FOL_OPERATIONAL-BINDING-AWARE-TX-CHAINING-DELETE-1/100000OUTER/INNER-1OP-EXEC
2.282 2.27075 FOL_OPERATIONAL-BINDING-INDEPENDENT-SIMPLE-TX-DELETE-1/100000OUTER/INNER-1OP-EXEC
7.426875 4.394625 FOL_OPERATIONAL-BINDING-INDEPENDENT-TX-CHAINING-DELETE-1/100000OUTER/INNER-1OP-EXEC
0.612625 1.598125 OPERATIONAL-BINDING-AWARE-SIMPLE-TX-DELETE-1/100000OUTER/INNER-1OP-EXEC
1.25825 1.911375 OPERATIONAL-BINDING-AWARE-TX-CHAINING-DELETE-1/100000OUTER/INNER-1OP-EXEC
0.535875 2.062625 OPERATIONAL-BINDING-INDEPENDENT-SIMPLE-TX-DELETE-1/100000OUTER/INNER-1OP-EXEC
0.86975 2.579875 OPERATIONAL-BINDING-INDEPENDENT-TX-CHAINING-DELETE-1/100000OUTER/INNER-1OP-EXEC

Comment by Robert Varga [ 13/Apr/17 ]

The graph shows around 6x regression in delete performance on January 9th. This is not reflected in Boron, so it is caused by something we did in Carbon around that time frame.

Since this is config data store, there are two possible causes:

  • a change in yangtools to DataTree which makes deletes less efficient
  • a change in controller around persistence

I did not find anything in yangtools. There are two candidates in controller:

Is there a way to dig narrow the window when the regression occurred?

Comment by Robert Varga [ 20/Apr/17 ]

So far the best candidate for causing the regression is https://git.opendaylight.org/gerrit/49404.

Comment by Robert Varga [ 20/Apr/17 ]

That is based on the fact that the regression occurred somewhere before builds #133 and #143.

The patch introduced a change in behaviour – we are now performing DataTree operations and invoking listeners on the follower before responding to the shard leader, which I think is what limits delete performance here as AppendEntry consensus is built up more slowly than before and is actually affected by listener code execution – notably by DataChangeListener, which are not off-loaded and are expensive to compute.

Comment by Tom Pantelis [ 20/Apr/17 ]

(In reply to Robert Varga from comment #5)
> That is based on the fact that the regression occurred somewhere before
> builds #133 and #143.
>
> The patch introduced a change in behaviour – we are now performing DataTree
> operations and invoking listeners on the follower before responding to the
> shard leader, which I think is what limits delete performance here as
> AppendEntry consensus is built up more slowly than before and is actually
> affected by listener code execution – notably by DataChangeListener, which
> are not off-loaded and are expensive to compute.

That patch fixed an inconsistency issue between lastAppliedIndex and what is actually applied to the data tree for snapshots.

ApplyState actually occurs after consensus as the leader has to first get consensus and update the commitIndex before updating lastAppliedIndex. However an AppendEntries could append new entry/entries for consensus and also apply previous entries.

In the end, the state will still eventually apply and block subsequent AppendEntries but replying before it applies the previous state, as you mentioned, should get us back to the previous behavior and allow the leader to proceed with consensus faster.

DataChangeListener notification generation is offloaded on a separate actor but there is a 10 ms mailbox-push-timeout-time which can block the shard actor. But if we're hitting that then we're reaching the 5000 mailbox-capacity and thus are in danger of dropping notifications. They would go to dead letters - hopefully that's not occurring.

Comment by Robert Varga [ 20/Apr/17 ]

Do both DCL and DTCL have dedicated actors? Also, for DCL determining whom to notify (i.e. while the registration tree is locked) is actually expensive, if I remember correctly...

Comment by Tom Pantelis [ 20/Apr/17 ]

Submitted https://git.opendaylight.org/gerrit/#/c/55733/

Comment by Tom Pantelis [ 20/Apr/17 ]

(In reply to Robert Varga from comment #7)
> Do both DCL and DTCL have dedicated actors? Also, for DCL determining whom
> to notify (i.e. while the registration tree is locked) is actually
> expensive, if I remember correctly...

yes - that's all done on a separate actor. Also the registration tree lock previously could block the shard on listener registration but that was recently changed to add new listeners on the actor as well.

Comment by Robert Varga [ 20/Apr/17 ]

I guess the question is when do we consider the entry as applied. There are essentially five choices:

1) when we have validated the AppendEntries
2) when we have updated the log indices
3) when we have applied the entries to DataTree
4) when we have queued up notifications
5) when notifications have been delivered

Previously we have done 1), currently we are doing 4), your patch reverts that to 1) (I think).

I think we want to go at least as far as 3), as at that point our internal journal records match what we have told the leader and DataTree actually reflects those changes.

I wonder what will happen if we send the response to the leader and we fail due to an exception before we finish processing the message...

Comment by Robert Varga [ 20/Apr/17 ]

Ah, right, both notifications are offloaded since that recent patch... so I guess we'll give it a day or two to see if the performance has changed

Comment by Tom Pantelis [ 20/Apr/17 ]

(In reply to Robert Varga from comment #10)
> I guess the question is when do we consider the entry as applied. There are
> essentially five choices:
>
> 1) when we have validated the AppendEntries
> 2) when we have updated the log indices
> 3) when we have applied the entries to DataTree
> 4) when we have queued up notifications
> 5) when notifications have been delivered
>
> Previously we have done 1), currently we are doing 4), your patch reverts
> that to 1) (I think).

My new patch essentially reverts it back to the behavior prior to https://git.opendaylight.org/gerrit/49404, in terms of when the reply is sent wrt to applying previous state. Prior, handleAppendEntries updated lastAppliedIndex and queued an ApplyState message to apply to the data tree, then replied to the leader. So handleAppendEntries always returned before applying the previous state. However this resulted in an inconsistency in the persisted state between lastAppliedIndex and the actual applied state in the snapshot if one was triggered immediately after but before ApplyState was processed. Patch 49404 fixed that by making the 2 atomic. This new patch merely reverts it back to the behavior of handleAppendEntries replying before the previous state is applied.

>
> I think we want to go at least as far as 3), as at that point our internal
> journal records match what we have told the leader and DataTree actually
> reflects those changes.

As far as the leader is concerned when sending new entries, it's only interested in as far as 2). Remember, normally it takes 2 AppendEntries to get an entry applied to the state. The first is sent with the new entry to append to the journal and the second is sent with the updated commit index after the first one responds and consensus is reached. The second AppendEntries may contain subsequent new entries and prior ones may be applied to the state as well as a side effect but the leader is not concerned with the latter and we don't need to block the reply.

>
> I wonder what will happen if we send the response to the leader and we fail
> due to an exception before we finish processing the message...

It doesn't matter - either way we respond to the leader. We don't report apply state failures to the leader. The leader is only concerned that entries were appended to the journal and persisted. Raft doesn't define apply state failures wrt the leader.

Comment by Tomas Cere [ 21/Apr/17 ]

Tested locally, delete perf seems to be back on boron numbers.

Generated at Wed Feb 07 19:55:56 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.