Uploaded image for project: 'netvirt'
  1. netvirt
  2. NETVIRT-1168

snat-conntrack CSIT job connectivity failures

XMLWordPrintable

      There are sporadic failures due to connectivity issues in our -upstream-stateful-snat-conntrack-oxygen- job.
      There are multiple tempest.scenario failures happening on a sporadic basis. I have created a new label
      "csit:snat-conntrack" to try and group these more easily.

      I am filing this issue for a non-tempest failure that happened in an Oxygen release
      candidate job. A subsequent job did not fail. Since the overall nature of these failures
      is the same, I'm hoping there is a single (or just a few) root causes that will clean up
      all the failures in tempest and otherwise.

      After poking around a little bit, and comparing the flow table on a passing run's compute node 0 vs the
      same node on the failing run I think it looks like some number of flows are missing in the failing run. For
      example, there is no flow in table 36, but I see this in the passing job:

       cookie=0x9001392, duration=486.839s, table=36, n_packets=916, n_bytes=90892, priority=5,tun_id=0x5e actions=write_metadata:0x1392000000/0xfffffffff000000,goto_table:51

      Also, there are some 8 flows in table=51 in the passing job, but only 4 in the failing job.
      This could be explained by a few things. Maybe the extra flows are leftovers and should
      not even be there in the passing job. Or maybe the instances created in the test are
      on different compute nodes resulting the the difference. Either way, I just wanted to point
      it out.

            jluhrsen Jamo Luhrsen
            jluhrsen Jamo Luhrsen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: