[BGPCEP-294] ODL under load do not send Keepalives quickly enough. Created: 27/Oct/15 Updated: 03/Mar/19 Resolved: 06/Nov/15 |
|
| Status: | Resolved |
| Project: | bgpcep |
| Component/s: | BGP |
| Affects Version/s: | Bugzilla Migration |
| Fix Version/s: | Bugzilla Migration |
| Type: | Bug | ||
| Reporter: | Vratko Polak | Assignee: | Claudio David Gasparini |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| External issue ID: | 4533 |
| Description |
|
Seen on recent Lithium snapshot builds. A Sandbox run [0] for new suite (2 iBGP peers, 1 million routes total, continuous GET in IPv4 topology during initial synchronization) has seen one suite fail. Output from play.py shows holdtimer was overstepped, karaf.log shows frequent OperationsLimiter activity. Previous runs did not see this bug, more data will get available when the suite is merged. Increased number of peers would risk triggering |
| Comments |
| Comment by Radovan Sajben [ 02/Nov/15 ] |
|
... another evidence (1M UPDATEs & 1M target number of prefixes, each UPDATE includes 10 prefixes in NLRI & 9 prefixes in WITHDRAWN lists): https://jenkins.opendaylight.org/releng/job/bgpcep-csit-1node-periodic-bgp-ingest-mixed-only-stable-lithium/2/robot/report/log.html#s1-s2-t8-k3-k3-k2 |
| Comment by Claudio David Gasparini [ 04/Nov/15 ] |
|
After talk with Radovan, issue observed is 2015-11-01 16:15:15,620 | INFO | oupCloseable-6-2 | BGPSessionImpl | 281 - org.opendaylight.bgpcep.bgp-rib-impl - 0.4.3.SNAPSHOT | Failed to send message Keepalive [augmentation=[]] to socket java.io.IOException: Connection reset by peer This is an expected behavior, |
| Comment by Vratko Polak [ 05/Nov/15 ] |
|
> the session is closed abruptly form client side Yes, but that is because the hold timer expired, and when ODL got around to eventually send the keepalive it was too late. |
| Comment by Radovan Sajben [ 06/Nov/15 ] |
|
Adding more logging to the test tool showed an regression inside (because of a decoding issue, only the first keep alive could reset the hold timer). After correction in the test tool logs confirmed that there is not an issue with KEEPALIVE on the controller side: https://jenkins.opendaylight.org/sandbox/job/bgpcep-csit-1node-periodic-bgp-ingest-mixed-only-stable-lithium/26/robot/report/log.html#s1-s2-t8-k3-k3-k2 |