[CONTROLLER-1715] 6 GB heap is not entirely enough for BGP ingest test with 1 million prefixes when tell-based protocol is used Created: 09/Jun/17 Updated: 25/Jul/23 Resolved: 02/Jul/21 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | None |
| Fix Version/s: | 4.0.0 |
| Type: | Bug | ||
| Reporter: | Vratko Polak | Assignee: | Oleksii Mozghovyi |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | pt | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| External issue ID: | 8649 |
| Description |
|
This failure started happening consistently in a BGP test with single node, but using tell-based protocol, so this may be related to cluster testing. The BGP job runs several scenarios. For this bug I compare 4 of them, differing on which side initiates BGP connection and whether ask- or tell- based protocol is used. The bug manifests as inability of ODL to clear ipv4 topology [0] after the connection has been closed. This does not happen if ODL is not initiating the connection, perhaps because that scenario is run first in the suite and memory pressure is lower. This does not happen with the exact same suite running against ODL with ask-based protocol (both times it is the first suite after initial start or hard reboot). The title was chosen to reflect the only difference I have recognized in karaf.log [1]: Though subsequent messages suggest there is still some progress being done somewhere (those messages are also present in the preceding passing scenario): [0] https://logs.opendaylight.org/releng/jenkins092/bgpcep-csit-1node-periodic-bgp-ingest-only-carbon/306/log.html.gz#s1-s9-t17
|
| Comments |
| Comment by Tom Pantelis [ 09/Jun/17 ] |
|
That indicates the DTCN generator actor's mailbox reached the limit. If it only happens with tell-based perhaps it's not throttling enough compared to ask-based. You can try increasing the mailbox capacity in the akka.conf. Default settings are: bounded-mailbox { mailbox-capacity = 5000 mailbox-push-timeout-time = 10ms }Perhaps we shouldn't use a bounded mailbox for DTCN generation especially since DTCL notification actor mailboxes are unbounded. As such, it seems using a bounded mailbox for the former doesn't really make sense. |
| Comment by Vratko Polak [ 12/Jun/17 ] |
|
This still occurs in every run. One suspicious thing I have not mentioned is that read from ipv4-example-topology take significantly long time. For example 7 minutes [2], while the previous steps are able to reach stability in around two minutes. |
| Comment by Vratko Polak [ 12/Jun/17 ] |
|
> significantly long time That is because Garbage Collection started being busy during that time, see [3]. |
| Comment by Vratko Polak [ 21/Jun/17 ] |
|
The test passed [4] today, even though the garbage collection log [5] still shows significant slowdown during its run. [4] https://logs.opendaylight.org/releng/jenkins092/bgpcep-csit-1node-periodic-bgp-ingest-only-carbon/317/log.html.gz#s1-s9 |
| Comment by Vratko Polak [ 06/Jul/17 ] |
|
Current runs around Carbon SR1 branch freeze show the suite does not fail [6], but garbage collection [7] is still slowing the test down. [] https://jenkins.opendaylight.org/releng/view/bgpcep/job/bgpcep-csit-1node-periodic-bgp-ingest-only-carbon/334/robot/bgpcep-bgp-ingest.txt/Singlepeer%20Prefixcount_1/ |
| Comment by Vratko Polak [ 12/Jul/17 ] |
|
> Current runs around Carbon SR1 branch freeze show the suite does not fail But not 100% of the time, sometimes the test still fails [8]. |
| Comment by Vratko Polak [ 12/Sep/17 ] |
|
Changed title to "6 GB heap is not entirely enough for BGP ingest test with 1 million prefixes when tell-based protocol is used". |
| Comment by Oleksii Mozghovyi [ 08/Apr/21 ] |
|
I've increased the number of prefixes to 1.000.000 for the following tests: |