Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1715

6 GB heap is not entirely enough for BGP ingest test with 1 million prefixes when tell-based protocol is used

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • 4.0.0
    • None
    • clustering
    • Operating System: All
      Platform: All

    • 8649

      This failure started happening consistently in a BGP test with single node, but using tell-based protocol, so this may be related to cluster testing.

      The BGP job runs several scenarios. For this bug I compare 4 of them, differing on which side initiates BGP connection and whether ask- or tell- based protocol is used.

      The bug manifests as inability of ODL to clear ipv4 topology [0] after the connection has been closed. This does not happen if ODL is not initiating the connection, perhaps because that scenario is run first in the suite and memory pressure is lower. This does not happen with the exact same suite running against ODL with ask-based protocol (both times it is the first suite after initial start or hard reboot).

      The title was chosen to reflect the only difference I have recognized in karaf.log [1]:
      2017-06-09 06:33:50,808 | INFO | lt-dispatcher-20 | EmptyLocalActorRef | 155 - com.typesafe.akka.slf4j - 2.4.18 | Message [org.opendaylight.controller.cluster.datastore.messages.DataTreeChanged] from Actor[akka://opendaylight-cluster-data/deadLetters] to Actor[akka://opendaylight-cluster-data/user/$M] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

      Though subsequent messages suggest there is still some progress being done somewhere (those messages are also present in the preceding passing scenario):
      2017-06-09 06:33:51,408 | WARN | lt-dispatcher-41 | FrontendClientMetadataBuilder | 180 - org.opendaylight.controller.sal-distributed-datastore - 1.5.1.Carbon | member-1-shard-topology-operational: Unknown history for aborted transaction member-1-datastore-operational-fe-0-txn-31-1, ignoring

      [0] https://logs.opendaylight.org/releng/jenkins092/bgpcep-csit-1node-periodic-bgp-ingest-only-carbon/306/log.html.gz#s1-s9-t17
      [1] https://logs.opendaylight.org/releng/jenkins092/bgpcep-csit-1node-periodic-bgp-ingest-only-carbon/306/odl1_karaf.log.gz

       

       

      [0] https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/bgpcep-csit-1node-bgp-ingest-all-silicon/216/robot-plugin/log.html.gz#s1-s9-t3-k2-k1

      [1] https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/bgpcep-csit-1node-bgp-ingest-all-silicon/216/robot-plugin/log.html.gz#s1-s9-t5-k2-k1-k1-k1-k2

      [2] https://github.com/opendaylight/integration-test/blob/master/csit/suites/bgpcep/bgpingest/singlepeer_prefixcount.robot#L51

            oleksii.mozghovyi Oleksii Mozghovyi
            vrpolak Vratko Polak
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: