Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1763

On restarting ODL on one node, ODL on another node dies in a clustered setup

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • None
    • Carbon
    • clustering
    • None
    • Operating System: All
      Platform: All

    • 9064
    • Highest

      Description of problem:
      On running low stress longevity tests using Browbeat+Rally (creating 40 neutron resources 2 at a time and deleting them, over and over again), in a clustered ODL setup, ODL on controller-1 hits OOM after about 42 hours into the test. ODL on controller-2 is functional at that point but ODL on controller-0 seems to be running and ports are up but is non-functional (see BZ https://bugzilla.redhat.com/show_bug.cgi?id=1486060). When ODL on controller-0 is restarted to make it functional again at around 16:01 UTC 08/28/2017, ODL on controller-2 dies at around 16:04 UTC 08/28/2017. ODL on controller-1 which hit OOM is left alone.

      Here we can see the karaf process count going to 0 on controller-2 around 16:04 UTC 08/28/2017: https://snapshot.raintank.io/dashboard/snapshot/chxdQkhAw3X8l9LS2HNNzCZGQHQvWubO?orgId=2

      The heap is dumped before the process dies, however it can be clearly seen that the 2G heap is not reached here: https://snapshot.raintank.io/dashboard/snapshot/RMuDksXZ61ql2kMA47wqBHUXQeYWG05g?orgId=2

      Max heap used is around 1.4G

      Setup:
      3 ODLs
      3 OpenStack Controllers
      3 Compute nodes

      ODL RPM from upstream: python-networking-odl-11.0.0-0.20170806093629.2e78dca.el7ost.noarch

      Test:
      Create 40 neutron resources (rotuers, networks etc) 2 at a time using Rally and delete them over and over again. This is a long running low stress test.

      Additional info:
      ODL Controller-0 Logs:
      http://8.43.86.1:8088/smalleni/karaf-controller-0.log.tar.gz
      ODL Controller-1 Logs:
      http://8.43.86.1:8088/smalleni/karaf-controller-1.log.tar.gz
      http://8.43.86.1:8088/smalleni/karaf-controller-1-rollover.log.tar.gz
      ODL Controller-2 Logs:
      http://8.43.86.1:8088/smalleni/karaf-controller-2.log.tar.gz

            vorburger Michael Vorburger
            smalleni@redhat.com Sai Sindhur Malleni
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: