-
New Feature
-
Resolution: Done
-
Medium
-
None
Currently there are following two parameters that control the wait time between NETCONF connection attempts:
Since the wait time increases with the multiplicative factor of sleep-factor after each connection failure the gap between connection attempts increases very quickly. And this also means that the wait time increases as fast as the total elapsed time of connectivity loss or connection attempts.
For example, assuming the following default configuration values:
- between-attempts-timeout-millis: 2 seconds
- sleep-factor: 1.5
- connection-timeout-millis: 20 seconds
Total elapsed time after n reconnections is {{ 20 x n + (1.5^(n - 1) - 1) / (1.5 - 1) x 2 }} and the next wait time is {{ 1.5^n x 2 }}:
So the total elapsed times look like this:
- n=17: 2963
- n=18: 4297
- n=19: 6288
- n=20: 9263
The next wait times are:
- n=17: 1970
- n=18: 2955
- n=19: 4433
- n=20: 6650
This means after over 2.5 hours of connectivity loss (n=20, 9263 seconds), the the NETCONF session may not recover for almost another 1 hour and 50 minutes (6650 seconds) even if the issue has been resolved right after the 20th connection attempt because of the long wait time.
This most likely is not an expected behavior and it is common to have the maximum value limiting the gap between connection attempts when exponential backoff is used - also it's common to introduce randomness to avoid synchronization between different connection attempts - e.g. if multiple devices are disconnected due to the same network connectivity issue, the controller may try to connect to the devices almost at the same time.