[OVSDB-358] [SR3] RPC timeout in JsonRPCEndpoint is not configurable Created: 06/Aug/16  Updated: 19/Oct/17  Resolved: 10/Aug/16

Status: Resolved
Project: ovsdb
Component/s: Library
Affects Version/s: unspecified
Fix Version/s: None

Type: Bug
Reporter: Ashwin P Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 6352
Priority: High

 Description   

The RPC timeout in JsonRPCEndpoint/getClient method is 300 ms:
private static final int REAPER_INTERVAL = 300;

This timeout is not configurable, and it can be quite aggressive for heavy-weight operations on OVSDB. This leads to the following exception:
java.util.concurrent.CancellationException: Task was cancelled.

One of the solutions would be to retry if the task was cancelled. However, for operations on OVSDB that genuinely take > 300 ms, this will not work – The task will be cancelled again.



 Comments   
Comment by Anil Vishnoi [ 06/Aug/16 ]

Hi Ashwin,

I pushed the patch to fix the issue. It increases the timeout to 1000 ms and also providers the config knob to change it.

stable/beryllium : https://git.opendaylight.org/gerrit/#/c/43273/

Can you please test it in your environment and update the bug with your findings.?

Thanks
Anil

Comment by Ashwin P [ 07/Aug/16 ]

Since this is a reaper timeout (and has no performance impact under normal operating conditions), I'd suggest setting it to a much larger default value (e.g. 60 sec). I tried this patch with a 60 sec timeout in the config file, and it seems to be working fine.

Comment by Anil Vishnoi [ 07/Aug/16 ]

I believe 60 second is too long to wait for a single device operation. Do you see any scenario where any single ovsdb rpc operation can take 60 seconds?

Waiting longer for task has a memory overhead, because all the task are getting accumulated and their respective reaper task as well. So if you connect 1000 nodes, waiting for 60 seconds probably accumulate lot of short lived object and cleaning them up will trigger major GC and that can be issues in clustered environment. So until and unless there is any specific scenario where default value of 1000 second is not appropriate, in my opinion we should keep the timeout minimum. For specific scenario user can always tweak the provided config knob.

Comment by Ashwin P [ 07/Aug/16 ]

Thanks for the detailed explanation. I agree with the reasoning.

Comment by Anil Vishnoi [ 07/Aug/16 ]

(In reply to Anil Vishnoi from comment #3)
> I believe 60 second is too long to wait for a single device operation. Do
> you see any scenario where any single ovsdb rpc operation can take 60
> seconds?
>
> Waiting longer for task has a memory overhead, because all the task are
> getting accumulated and their respective reaper task as well. So if you
> connect 1000 nodes, waiting for 60 seconds probably accumulate lot of short
> lived object and cleaning them up will trigger major GC and that can be
> issues in clustered environment. So until and unless there is any specific
> scenario where default value of 1000 second is not appropriate, in my
> opinion we should keep the timeout minimum. For specific scenario user can
> always tweak the provided config knob.

Small Correction, it's 1000 millisecond and not 1000 seconds.

Comment by Anil Vishnoi [ 07/Aug/16 ]

master : https://git.opendaylight.org/gerrit/#/c/43276/

Generated at Wed Feb 07 20:36:11 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.