[NETCONF-170] Poor performance of MDSAL Northbound mapped on a Netconf connector Created: 22/Mar/16  Updated: 15/Mar/19  Resolved: 13/Apr/17

Status: Resolved
Project: netconf
Component/s: netconf
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Jozef Behran Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Zip Archive 5581.zip    
External issue ID: 5581

 Description   

When RestPerfClient is run on MDSAL Northbound directly, the MDSAL Northbound manages to serve over 400 requests per second. However when it is run on MDSAL Northbound mapped on a Netconf Connector, the request rate drops significantly. The RestPerfClient output log attached to NETCONF-169 suggests it is below 220 per second but the exact rate could not be determined because of NETCONF-169 combined with lack of support for configuring the timeouts in RestPerfClient (which is a new feature) in the performance test suite.

Yes, I expect a performance drop when going through a Netconf connector versus going directly but a performance drop of over 50% is way too severe.

Looking into the logs reveals massive amount of "Timing out for 1 session(s) with id <24-to-26-random-chars>" warnings emitted during the time when RestPerfClient emitting the requests to the Netconf connector mapped MDSAL Northbound.



 Comments   
Comment by Andrej Mak [ 12/Sep/16 ]

Hi, if you compare steps executed during direct and mountpoint access to ODL datastore, you can see, that access via netconf connector is much more complex.

Direct approach:
1. HTTP server
2. URL, XXL -> YangInstanceIdentifier, NormalizedNode
3. Datastore write

Via netconf connector
1. HTTP server
2. URL, XXL -> YangInstanceIdentifier, NormalizedNode
3. YangInstanceIdentifier, NormalizedNode -> Netconf RPC XML
4. Netconf southbound Netty client - exi handlers, ssh, network io
5. Netconf northbound Netty server - exi handlers, ssh, network io
6. Netconf RPC XML -> YangInstanceIdentifier, NormalizedNode
7. Datastore write

Additionally, netconf mountpoint write transaction operations are implemented as sequence of multiple netconf rpcs (lock, edit-config, commit, unlock)
which causes additional network io overhead.

"Timing out for 1 session(s) with id <24-to-26-random-chars>" warnings are logged by restconf and shouldn't affect this issue.

I've also profiled ODL under this test and haven't found spots suitable for optimalization.

Comment by Peter Gubka [ 03/Jan/17 ]

Attachment 5581.zip has been added with description: job files

Comment by Peter Gubka [ 03/Jan/17 ]

All the scale jobs are hitting the same problem, that sending rest requests using netconf connector takes too long.
At the moment (Jan03) it takes so long that ssh session is broken during the test and we dont see the poor performance.
All jobs affected: netconf-csit-1node-periodic-scale-<all|only>-<beryllium,boron,carbon>

Using refs/changes/68/49468/8 the test was done and all files are attached. The performance drop is seen comparing restperfclient-netconf-scale-txt-Mdsaldirect.log.gz and restperfclientnetconf-scale-txt-Mdsal-netconf-connector.log.gz.

13:11:03.692 [main] INFO o.o.n.t.t.c.http.perf.RestPerfClient - FINISHED. Execution time: 11.66 min
13:11:03.698 [main] INFO o.o.n.t.t.c.http.perf.RestPerfClient - Requests per second: 7.148453360630894
vs.
2:59:21.266 [main] INFO o.o.n.t.t.c.http.perf.RestPerfClient - FINISHED. Execution time: 20.38 s
12:59:21.274 [main] INFO o.o.n.t.t.c.http.perf.RestPerfClient - Requests per second: 245.30245793062846

Drop in performance from 245 req/s to 7 req/s is too significant even if the netconf connector path is more complicated.

Comment by Andrej Mak [ 16/Feb/17 ]

Exi handler optimization
https://git.opendaylight.org/gerrit/#/c/51834/
Merge empty list instead of exist-check
https://git.opendaylight.org/gerrit/#/c/51866/
If it is possible, read only list entry instead of whole list for filtering
https://git.opendaylight.org/gerrit/#/c/51643/

With these patches, test shouldn't fail on timeout and it should make ~22 requests/s.
When ssh is disabled, test can do more than 100 reqests/s, so there may be spots for further optimization in netconf SSH handling.

Comment by Robert Varga [ 28/Feb/17 ]

I think the results are fine. We know the SSH integration needs major surgery to get the performance we would like, especially in case of single-threaded access.

Comment by Andrej Mak [ 13/Apr/17 ]

Netconf SSH pipeline improvements are tracked here:
https://bugs.opendaylight.org/show_bug.cgi?id=2484

Generated at Wed Feb 07 20:14:20 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.