[NETCONF-390] Netconf performance in cluster is way worse than in single node Created: 04/Apr/17  Updated: 15/Mar/19  Resolved: 21/Apr/17

Status: Resolved
Project: netconf
Component/s: netconf
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Vratko Polak Assignee: Vratko Polak
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 8149

 Description   

Single node CSIT result [0] is around 400 requests per second.
Three node test (device configured through node 1, device data accessed through node 3) result [1] is almost 20 times worse.

[0] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-1node-periodic-scale-all-carbon/219/archives/log.html.gz#s1-s2-t5-k3-k15-k6-k2-k2
[1] https://logs.opendaylight.org/sandbox/jenkins091/netconf-csit-3node-periodic-clustering-scale-all-carbon/1/archives/log.html.gz#s1-s2-t6-k3-k15-k6-k2-k2



 Comments   
Comment by Miroslav Kovac [ 10/Apr/17 ]

Could you pleas rerun sandbox job for 3node clustering

Comment by Vratko Polak [ 10/Apr/17 ]

The performance for Carbon all test is now known [0] to be around 20.48 requests per second. Interestingly, Carbon only test has slightly worse performance of around 19.26 requests per second, result from Sandbox [1].

[0] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-3node-periodic-clustering-scale-all-carbon/228/archives/log.html.gz#s1-s2-t6-k3-k15-k6-k2-k2
[1] https://logs.opendaylight.org/sandbox/jenkins091/netconf-csit-3node-periodic-clustering-scale-only-carbon/1/archives/log.html.gz#s1-s2-t6-k3-k15-k6-k2-k2

Comment by Tomas Cere [ 13/Apr/17 ]

Is this a comparison between the clustered and non-clustered feature or clustered feature running in single and 3 node scenarios?

Comment by Miroslav Kovac [ 18/Apr/17 ]

Could you also send me what parameters are you using for testtool and RestPerfClient, please?

I am not really sure how you could achieve such a big gap in those tests because when I am testing it localy I get only 2 to 5 requests difference. My performance is 17 to 20 requests for netconf-topology and 15 to 17 requests for netconf-clustered-topology.

Only time I was able to receive higher amount of requests, was when I was sending requests to mountpoint which was not created and therefore could not write anything. I was just wondering if this can`t be also case of the robot test. Or something similiar.

Comment by Vratko Polak [ 19/Apr/17 ]

> Is this a comparison between the clustered and non-clustered feature

Looking at the job definition: Yes, it is.
Should the clustered feature be that much slower?

> what parameters are you using for testtool and RestPerfClient

Beryllium commands (identical for single node and cluster, except log file name):
Testtool start [4] (nothing in ./schemas):
/usr/lib/jvm/java-7-openjdk-amd64/bin/java Xmx1G -XX:MaxPermSize=256M -Dorg.apache.sshd.registerBouncyCastle=false -jar netconf-testtool-1.0.5-20170413.092814-27-executable.jar --device-count 1 --debug false --schemas-dir ./schemas --md-sal false >testtool-netconf-scale-txt-Performance.log 2>&1

Restperfclient start [5]:
/usr/lib/jvm/java-7-openjdk-amd64/bin/java Xmx1G -XX:MaxPermSize=256M -jar netconf-testtool-1.0.5-20170413.092814-27-rest-perf-client.jar --ip 10.29.13.188 --port 8181 --edits 65536 --edit-content request1.json --async-requests true --auth admin admin --timeout 22 --destination /restconf/config/network-topology:network-topology/topology/topology-netconf/node/17830-sim-device/yang-ext:mount/car:cars 2>&1 | tee restperfclient-netconf-scale-txt-Performance.log

request1.json is here [6].

[4] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-1node-periodic-scale-only-beryllium/456/archives/log.html.gz#s1-s2-t1-k3-k7
[5] https://logs.opendaylight.org/releng/jenkins092/netconf-csit-1node-periodic-scale-only-beryllium/456/archives/log.html.gz#s1-s2-t5-k3-k15-k4
[6] https://github.com/opendaylight/integration-test/blob/master/csit/variables/netconf/RestPerfClient/request1.json

Comment by Tomas Cere [ 19/Apr/17 ]

(In reply to Vratko Polák from comment #5)
> > Is this a comparison between the clustered and non-clustered feature
>
> Looking at the job definition: Yes, it is.
> Should the clustered feature be that much slower?
>

Do we know if the requests are going through the slave mountpoint? The master mountpoint should have similar performance numbers as the nonclustered feature
as there's no overhead afaik, I can only see this happening on the slave mountpoints and I'm not sure there's space for performance improvements there since it's only forwarding the requests.

Comment by Miroslav Kovac [ 20/Apr/17 ]

Could you try to start Restperfclient with asynRequests --async-requests false??? for both single node and 3 node cluster?

Comment by Andrej Mak [ 20/Apr/17 ]

https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-1node-periodic-scale-only-carbon/
1node runs Rest perf client with --async-requests true

https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-3node-periodic-clustering-scale-only-carbon/
3node runs Rest perf client with --async-requests false

That's the reason of poor performance. In current implementation of netconf clustered topology it isn't possible to use async requests, since it permits only one open write transaction at the time and async requests would cause IllegalStateExceptions - Transaction is already opened for another user.

Comment by Andrej Mak [ 20/Apr/17 ]

https://git.opendaylight.org/gerrit/#/c/52235/ would allow to use async requests in tests. However, simultaneous write on real device via either odl-netconf-topology or odl-netconf-clustered-topology would fail, because lock rpc is sent to device before every write and unlock is sent when write completes.

Comment by Vratko Polak [ 20/Apr/17 ]

> Do we know if the requests are going through the slave mountpoint?

Requests are doing through member-3, which is practically always a slave.

Comment by Andrej Mak [ 21/Apr/17 ]

https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-1node-periodic-scale-only-carbon/ should be adapted to use --async-requests false

Comment by Vratko Polak [ 21/Apr/17 ]

> use --async-requests false

Suite change [0] tested on Sandbox [1] confirms the (sync) performance is basically the same on 1 and 3 nodes.

[0] https://git.opendaylight.org/gerrit/#/c/55707/2
[1] https://jenkins.opendaylight.org/sandbox/job/netconf-csit-1node-periodic-scale-all-carbon/4/robot/netconf-scale.txt/Performance/Run_Restperfclient/

Generated at Wed Feb 07 20:14:54 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.