[NETVIRT-873] CSIT 3node failures - External Network Suite - FIP connectivity failures Created: 29/Aug/17  Updated: 23/Oct/18  Resolved: 23/Oct/18

Status: Resolved
Project: netvirt
Component/s: General
Affects Version/s: Nitrogen
Fix Version/s: None

Type: Bug Priority: Medium
Reporter: Jamo Luhrsen Assignee: Chetan Arakere Gowdru
Resolution: Won't Do Votes: 0
Labels: csit:failures
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 9074

 Description   

https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-3node-openstack-ocata-upstream-stateful-nitrogen/148/log.html.gz#s1-s1-s3-t13

These failures are not there in our 1node (non-HA) job, but are the
same tests. It should also be noted that these failures are happening
before any HA related activities (e.g., taking a controller in/out of
the cluster) are done.

My first suggestion would be to compare flow table entries of the
control/compute nodes in this job vs a passing example in the 1node
job:

https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-ocata-upstream-stateful-nitrogen/152/



 Comments   
Comment by Karthikeyan Krishnan [ 30/Aug/17 ]

Compared Build #148 (Failed FIP 4 TCs) and Build #149 (Passed FIP 4 TCs) and below are the observations.

Build #148:
===========

Table 25->27
cookie=0x8000004, duration=101.387s, table=25, n_packets=0, n_bytes=0, priority=10,ip,dl_dst=fa:16:3e:b3:81:8f,nw_dst=10.10.10.10 actions=set_field:90.0.0.6->ip_dst,write_metadata:0x324c0/0xfffffe,goto_table:27

Table 21->25 (Here instead of pointing to table 25, it is pointing as remote tunnel out port). This is the root cause of this problem. Local FIP flow is missing here.

cookie=0x8000003, duration=101.222s, table=21, n_packets=6, n_bytes=588, priority=42,ip,metadata=0x324b6/0xfffffe,nw_dst=10.10.10.10 actions=set_field:fa:16:3e:b3:81:8f->eth_dst,load:0x700->NXM_NX_REG6[],resubmit(,220)

Log Reference Link:
==================
https://jenkins.opendaylight.org/releng/user/jluhrsen/my-views/view/netvirt%20csit/job/netvirt-csit-3node-openstack-ocata-upstream-stateful-nitrogen/148/robot/report/log.html#s1-s1-s3-t15

Build #149
==========

Table 21->25 (Local FIP entry properly got installed)
cookie=0x8000003, duration=65.479s, table=21, n_packets=6, n_bytes=588, priority=42,ip,metadata=0x30d44/0xfffffe,nw_dst=10.10.10.13 actions=set_field:fa:16:3e:b3:95:35->eth_dst,goto_table:25

Table 25->27
cookie=0x8000004, duration=65.511s, table=25, n_packets=6, n_bytes=588, priority=10,ip,dl_dst=fa:16:3e:b3:95:35,nw_dst=10.10.10.13 actions=set_field:90.0.0.8->ip_dst,write_metadata:0x30d4e/0xfffffe,goto_table:27

Log Reference Link:
=======================
https://jenkins.opendaylight.org/releng/user/jluhrsen/my-views/view/netvirt%20csit/job/netvirt-csit-3node-openstack-ocata-upstream-stateful-nitrogen/149/robot/report/log.html#s1-s1-s3-t15

We will observe next upcoming CSIT 3 node job for confirming the issue consistence.

Comment by Jamo Luhrsen [ 01/Sep/17 ]

I see some of these that might be related:

2017-09-01 17:23:01,300 | INFO | 2]-nio2-thread-5 | ServerSession | 130 - org.apache.sshd.core - 0.14.0 | Server session created from /10.29.4.11:52656
2017-09-01 17:23:01,356 | WARN | 2]-nio2-thread-3 | ServerSession | 130 - org.apache.sshd.core - 0.14.0 | Exception caught
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)[:1.8.0_141]
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)[:1.8.0_141]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)[:1.8.0_141]
at sun.nio.ch.IOUtil.read(IOUtil.java:197)[:1.8.0_141]
at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finishRead(UnixAsynchronousSocketChannelImpl.java:387)[:1.8.0_141]
at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finish(UnixAsynchronousSocketChannelImpl.java:191)[:1.8.0_141]
at sun.nio.ch.UnixAsynchronousSocketChannelImpl.onEvent(UnixAsynchronousSocketChannelImpl.java:213)[:1.8.0_141]
at sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:293)[:1.8.0_141]
at java.lang.Thread.run(Thread.java:748)[:1.8.0_141]

Comment by Karthikeyan Krishnan [ 19/Sep/17 ]

Have added few info logs in Fibmanager-Impl and raised the below review in stable/nitrogen and ran 3-node cluster CSIT job for 4 times and looks like issue is not been observed after info logs been added into the code. Still trying to identify the RCA of this issue. Will post the result once identified the RCA of this issue.

https://git.opendaylight.org/gerrit/#/c/63081/

Stable/Nitrogen 3 Node Cluster Job:
------------------------------------
https://jenkins.opendaylight.org/releng/job/netvirt-csit-3node-openstack-ocata-gate-stateful-nitrogen/6/

Note: 3-Node cluster CSIT job on Master (Oxygen) --> Issue is not re-producible.

Master (Oxygen) 3 Node Cluster Job:
------------------------------------
https://jenkins.opendaylight.org/releng/job/netvirt-csit-3node-openstack-ocata-gate-stateful-oxygen/1/

Comment by Sam Hague [ 23/Oct/18 ]

Bug closed and others will be opened after the recent clustering efforts make it easier to focus on different issues.

Generated at Wed Feb 07 20:22:41 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.