[OVSDB-358] [SR3] RPC timeout in JsonRPCEndpoint is not configurable Created: 06/Aug/16 Updated: 19/Oct/17 Resolved: 10/Aug/16 |
|
| Status: | Resolved |
| Project: | ovsdb |
| Component/s: | Library |
| Affects Version/s: | unspecified |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Ashwin P | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| External issue ID: | 6352 |
| Priority: | High |
| Description |
|
The RPC timeout in JsonRPCEndpoint/getClient method is 300 ms: This timeout is not configurable, and it can be quite aggressive for heavy-weight operations on OVSDB. This leads to the following exception: One of the solutions would be to retry if the task was cancelled. However, for operations on OVSDB that genuinely take > 300 ms, this will not work – The task will be cancelled again. |
| Comments |
| Comment by Anil Vishnoi [ 06/Aug/16 ] |
|
Hi Ashwin, I pushed the patch to fix the issue. It increases the timeout to 1000 ms and also providers the config knob to change it. stable/beryllium : https://git.opendaylight.org/gerrit/#/c/43273/ Can you please test it in your environment and update the bug with your findings.? Thanks |
| Comment by Ashwin P [ 07/Aug/16 ] |
|
Since this is a reaper timeout (and has no performance impact under normal operating conditions), I'd suggest setting it to a much larger default value (e.g. 60 sec). I tried this patch with a 60 sec timeout in the config file, and it seems to be working fine. |
| Comment by Anil Vishnoi [ 07/Aug/16 ] |
|
I believe 60 second is too long to wait for a single device operation. Do you see any scenario where any single ovsdb rpc operation can take 60 seconds? Waiting longer for task has a memory overhead, because all the task are getting accumulated and their respective reaper task as well. So if you connect 1000 nodes, waiting for 60 seconds probably accumulate lot of short lived object and cleaning them up will trigger major GC and that can be issues in clustered environment. So until and unless there is any specific scenario where default value of 1000 second is not appropriate, in my opinion we should keep the timeout minimum. For specific scenario user can always tweak the provided config knob. |
| Comment by Ashwin P [ 07/Aug/16 ] |
|
Thanks for the detailed explanation. I agree with the reasoning. |
| Comment by Anil Vishnoi [ 07/Aug/16 ] |
|
(In reply to Anil Vishnoi from comment #3) Small Correction, it's 1000 millisecond and not 1000 seconds. |
| Comment by Anil Vishnoi [ 07/Aug/16 ] |