[CONTROLLER-1702] Performance: write-transactions is unable to fill the initial item list within 5 seconds Created: 06/Jun/17  Updated: 25/Jul/23  Resolved: 05/Oct/18

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Vratko Polak Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 8602

 Description   

This is a Carbon pre-SR1 bug not currently visible on RelEng, as it is hidden by CONTROLLER-1687.

Write-transactions is an RPC used to test clustering under load. It starts by creating a list of integer leaf items and then doing random updates to it. The initial size is 2^19 items, written in a single transaction.

The Robot tests currently wait for 5 seconds before affecting the cluster, so the transaction has time to be committed. Sandbox test run with fix of CONTROLLER-1687 shows that sometimes the test passes [0], but sometimes it fails by encountering [1] RequestTimeoutException just few seconds before 120 second limit.
But the response states the wait was over 120 seconds, and the karaf.log [2] confirms the exception happens in the initial item list filling transaction:

2017-06-06 07:19:35,179 | WARN | qtp1316418376-78 | WriteTransactionsHandler | 256 - org.opendaylight.controller.samples.clustering-it-provider - 1.5.1.SNAPSHOT | Unable to fill the initial item list.
TransactionCommitFailedException

{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.cluster.access.client.RequestTimeoutException: Timed out after 120029545279ns]]}

This might be a real performance bug, or it might be expected (as the suite performs reboots between tests, so the config datastore is not warmed up yet). If this is expected, we will increase the sleep time in Robot.

[0] https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/4/log.html.gz#s1-s14-t1
[1] https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/4/log.html.gz#s1-s14-t3-k2-k15-k1-k1-k1-k1-k1-k1-k2-k1-k1-k1
[2] https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/4/odl1_karaf.log.gz



 Comments   
Comment by Tomas Cere [ 06/Jun/17 ]

It's expected, there is no guarantee that the initial write will finish in 5 seconds

Comment by Vratko Polak [ 08/Jun/17 ]

Since [3] this does not affect tests. Lowering severity but keeping open as a performance bug.

[3] https://git.opendaylight.org/gerrit/58355

Generated at Wed Feb 07 19:56:14 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.