Uploaded image for project: 'netvirt'
  1. netvirt
  2. NETVIRT-1022

CSIT Sporadic failures - major functionality breakage

XMLWordPrintable

      From time to time we find some jobs that really go south and have lots of failures. When
      debugging and looking at the karaf.log there are many red flags, but one is the repeat
      logging of this message:

      2017-11-21 00:15:51,069 | INFO | pool-43-thread-1 | NaptPacketInHandler | 336 - org.opendaylight.netvirt.natservice-impl - 0.6.0.SNAPSHOT | onPacketReceived : Retry Packet IN Queue Size : 0

      in this job [0] that message was printed over 900 times

      something fundamental is broken in the environment and maybe the above log
      message is benign. It does not appear very much in the jobs that are passing and
      healthy.

      here is the karaf.log [1] for that job [0]. The log also has a lot of NullPointerExceptions,
      TransactionCommitFailedException, OptimisticLockFailedException, and UnsupportedOperationException.

      more debugging and analysis is needed to figure out what is really broken.

      [0] https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-ocata-upstream-stateful-oxygen/397/log_connectivity.html.gz
      [1] https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-ocata-upstream-stateful-oxygen/397/odl_1/odl1_karaf.log.gz

            Arunprakash Arunprakash D
            jluhrsen Jamo Luhrsen
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: