[NETCONF-355] Performance regression in controller-rest-cars-perf Created: 28/Feb/17 Updated: 15/Mar/19 Resolved: 21/Apr/17 |
|
| Status: | Verified |
| Project: | netconf |
| Component/s: | restconf-nb |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Vratko Polak | Assignee: | Robert Varga |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
| External issue ID: | 7868 |
| Description |
|
The regression may be in clustering code, or in Restconf code. The biggest difference is in Add_Cars test case, which uses a script [0] to send single POST request to create 10000 car items. But also Add_People test case takes longer, compare Boron durations [1] to Beryllium [2]. [0] https://github.com/opendaylight/integration-test/blob/master/tools/odl-mdsal-clustering-tests/scripts/cluster_rest_script.py |
| Comments |
| Comment by Robert Varga [ 23/Mar/17 ] |
|
Well, Verify_Cars fails, can they be related? To quantify this a bit, we are seeing about 100% performance degradation observed from test duration. |
| Comment by Vratko Polak [ 29/Mar/17 ] |
|
> Verify_Cars fails I do not see that in recent builds. Links without specific build number: Note that those are single node tests, the suite runs also in three node setup [6], but the difference in performance is small. [3] https://jenkins.opendaylight.org/releng/view/controller/job/controller-csit-1node-rest-cars-perf-only-beryllium/lastSuccessfulBuild/robot/controller-rest-cars-perf.txt/ |
| Comment by Vratko Polak [ 11/Apr/17 ] |
|
Numeric values (for times when Beryllium jobs are removed) for Add_Cars: |
| Comment by Robert Varga [ 18/Apr/17 ] |
|
Testing with distribution-karaf-0.6.0-20170418.121829-4936, it looks like the add case is doing an awful lot of reads. I will profile this further to get to the bottom of this. |
| Comment by Ivan Hrasko [ 18/Apr/17 ] |
|
Add must check for existence (by read) but reads were also in beryllium, There are some modifications in master compare to beryllium (uri parameters) but in this simple case they are not applied. When profiled beryllium and master and compared there were a lot of changes in Yangtools, maybe added some additional verifications for data. |
| Comment by Ivan Hrasko [ 18/Apr/17 ] |
|
Did you find more reads in recent versions? |
| Comment by Robert Varga [ 18/Apr/17 ] |
|
Well, the reason for the regression is in RESTCONF. The regression was introduced in https://git.opendaylight.org/gerrit/#/c/30554/48/restconf/sal-rest-connector/src/main/java/org/opendaylight/netconf/sal/restconf/impl/BrokerFacade.java@320, as now BrokerFacade issues a read request for each individual item in the list. |
| Comment by Robert Varga [ 18/Apr/17 ] |
|
So, the question is: what exactly is the reason for issuing the massive amounts of reads? I would prefer exact quotes from drafts, because the code sure as hell does not contain any sort of documentation about what is going on. |
| Comment by Robert Varga [ 18/Apr/17 ] |
|
Attachment Main-2017-04-18-2.snapshot.xz has been added with description: YKP tracing snapshot |
| Comment by Robert Varga [ 18/Apr/17 ] |
|
Snapshot shows BrokerFacade executing 10K read operations. Each of those is being waited synchronously, which is not something you can do while expecting reasonable performance. |
| Comment by Robert Varga [ 18/Apr/17 ] |
|
Furthermore, these reads are executed as the target is being modified, which is forcing DataTree to apply operations so it can report accurate results. |
| Comment by Ivan Hrasko [ 19/Apr/17 ] |
|
easy data - POST empty container |
| Comment by Ivan Hrasko [ 19/Apr/17 ] |
|
Attachment time results.tar.gz has been added with description: result of time measure between Be-4 and master running the same data |
| Comment by Ivan Hrasko [ 19/Apr/17 ] |
|
(In reply to Robert Varga from comment #7) -this code is for PACTH operation, we are dealing now with POST (postDataViaTransaction method) |
| Comment by Ivan Hrasko [ 19/Apr/17 ] |
|
when batch size for command is used wall time difference between master and be-4 arises command: |
| Comment by Ivan Hrasko [ 19/Apr/17 ] |
|
Attachment compare has been added with description: compare be-4 and master when batch size is 10000 |
| Comment by Robert Varga [ 19/Apr/17 ] |
|
carbon: https://git.opendaylight.org/gerrit/55205 This will need a proper follow-up to address overall code structure like:
|
| Comment by Martin Ciglan [ 19/Apr/17 ] |
|
will sit down with Ivan & Jakub to discuss follow-up notes |
| Comment by Vratko Polak [ 19/Apr/17 ] |
|
Boron looks fixed [7]: Carbon merged, waiting for test run. [7] https://jenkins.opendaylight.org/releng/view/controller/job/controller-csit-1node-rest-cars-perf-only-boron/947/robot/controller-rest-cars-perf.txt/Add%20Cars/ |
| Comment by Vratko Polak [ 20/Apr/17 ] |
|
> Nitrogen [8] merged now. |
| Comment by Vratko Polak [ 21/Apr/17 ] |
|
New Carbon performance [9] number: 0:00:02.314 |
| Comment by Ivan Hrasko [ 21/Apr/17 ] |
|
Follow up bugs/improvements: https://bugs.opendaylight.org/show_bug.cgi?id=8268 |
| Comment by Ivan Hrasko [ 21/Apr/17 ] |
|
Also performance issue: https://bugs.opendaylight.org/show_bug.cgi?id=6412 |