[BGPCEP-294] ODL under load do not send Keepalives quickly enough. Created: 27/Oct/15  Updated: 03/Mar/19  Resolved: 06/Nov/15

Status: Resolved
Project: bgpcep
Component/s: BGP
Affects Version/s: Bugzilla Migration
Fix Version/s: Bugzilla Migration

Type: Bug
Reporter: Vratko Polak Assignee: Claudio David Gasparini
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 4533

 Description   

Seen on recent Lithium snapshot builds.

A Sandbox run [0] for new suite (2 iBGP peers, 1 million routes total, continuous GET in IPv4 topology during initial synchronization) has seen one suite fail. Output from play.py shows holdtimer was overstepped, karaf.log shows frequent OperationsLimiter activity.

Previous runs did not see this bug, more data will get available when the suite is merged. Increased number of peers would risk triggering BGPCEP-289.

[0] https://jenkins.opendaylight.org/sandbox/job/bgpcep-csit-1node-periodic-bgp-ingest-only-stable-lithium/3/



 Comments   
Comment by Radovan Sajben [ 02/Nov/15 ]

... another evidence (1M UPDATEs & 1M target number of prefixes, each UPDATE includes 10 prefixes in NLRI & 9 prefixes in WITHDRAWN lists): https://jenkins.opendaylight.org/releng/job/bgpcep-csit-1node-periodic-bgp-ingest-mixed-only-stable-lithium/2/robot/report/log.html#s1-s2-t8-k3-k3-k2

Comment by Claudio David Gasparini [ 04/Nov/15 ]

After talk with Radovan, issue observed is

2015-11-01 16:15:15,620 | INFO | oupCloseable-6-2 | BGPSessionImpl | 281 - org.opendaylight.bgpcep.bgp-rib-impl - 0.4.3.SNAPSHOT | Failed to send message Keepalive [augmentation=[]] to socket java.io.IOException: Connection reset by peer
2015-11-01 16:15:15,621 | INFO | oupCloseable-6-2 | BGPPeer | 281 - org.opendaylight.bgpcep.bgp-rib-impl - 0.4.3.SNAPSHOT | Session with peer 10.30.11.190 went down
java.io.IOException: End of input detected. Close the session.

This is an expected behavior,
the cause of the exception is that once the message has been queued to be send,
the session is closed abruptly form client side and therefore the message cannot be send.

Comment by Vratko Polak [ 05/Nov/15 ]

> the session is closed abruptly form client side

Yes, but that is because the hold timer expired, and when ODL got around to eventually send the keepalive it was too late.
Radovan, do you have a link similar to that in comment 1 which would show the tool has reacted to previous keepalives correctly?

Comment by Radovan Sajben [ 06/Nov/15 ]

Adding more logging to the test tool showed an regression inside (because of a decoding issue, only the first keep alive could reset the hold timer). After correction in the test tool logs confirmed that there is not an issue with KEEPALIVE on the controller side: https://jenkins.opendaylight.org/sandbox/job/bgpcep-csit-1node-periodic-bgp-ingest-mixed-only-stable-lithium/26/robot/report/log.html#s1-s2-t8-k3-k3-k2

Generated at Wed Feb 07 19:12:37 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.