[NETVIRT-1212] CSIT fails with OVS 2.9.0 when group table fails to collect packets Created: 12/Apr/18  Updated: 28/Nov/19

Status: In Progress
Project: netvirt
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Medium
Reporter: Sam Hague Assignee: Srinivas Rachakonda
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

OVS 2.9.0 has issues with group tables collecting packets. This fails many of the csit tests. The bug below details the problem and the fix. The fix has been upstreamed but a newer OVS version has not been released yet.

https://bugzilla.redhat.com/show_bug.cgi?id=1544892

The upstream fix is: https://github.com/openvswitch/ovs/commit/058d3a4afddd4db6bf052c71ff431a59fa5a0335. Erics comment about the patches pushed upstream:
Applied upstream:

14ebc6c199a7 ("tests/ofproto-dpif: New test for action_set after traversing patch port")
118b21d93f55 ("ofproto-dpif-xlate: translate action_set in clone action")

It has also been backported to branch-2.9.

A possible workaround is to install the ovs package mentioned in the bugzilla. The basic idea is to add the repo config and then install ovs from that repo. This could be done in a couple ways:

1. In the current csit vm images, ovs is installed via the packer scripts. Add the repo config there and install ovs. Also remove the "sudo yum versionlock add openvswitch" from the run.sh script which is what is locking the ovs version at 2.7.3.

2. After the vms are started, ssh over to the vms, config the repo and install ovs.

I suspect updating with [1] is the easier solution, but it does require building the image.



 Comments   
Comment by Jamo Luhrsen [ 17/Apr/18 ]

according to this email I think we can revert the patch to not upgrade ovs.

Comment by Sam Hague [ 23/Apr/18 ]

Leaving this open until we have a package. The patch has been upstreamed but it needs to find it's way into an rpm that we can install.

Comment by Jamo Luhrsen [ 24/Apr/18 ]

for kicks, I ran a job today that ran with ovs2.9.0 and it still has trouble, so safe to assume
we still don't have the fix in the packages used by csit.

Comment by Sam Hague [ 02/Jun/18 ]

Upstream ovs has release 9.0.1 and 9.0.2 that have the fix. RDO has the following patch to include those builds in RDO: https://review.rdoproject.org/r/#/c/13839/. Once this patch merges we will have the fixes in RDO. Rebuild our packer images to pick up the new RDO package for openvswitch, remove the lock to 2.7 and we are good.

Comment by Abhinav Gupta [ 25/Nov/19 ]

do we still need to keep this open?

Comment by Jamo Luhrsen [ 25/Nov/19 ]

you will need to run some jobs with a newer ovs version to see. looking at a main job here which is
queens and neon, it's still running ovs 2.8

Comment by Abhinav Gupta [ 27/Nov/19 ]

Thanks JamO.
Srinivas, let's run a few jobs with OVS 2.9

Comment by Srinivas Rachakonda [ 27/Nov/19 ]

Scheduled a job to run with OVS 2.9.2

https://jenkins.opendaylight.org/sandbox/job/srini-netvirt-csit-1node-0cmb-1ctl-2cmp-openstack-queens-upstream-stateful-neon/

Comment by Srinivas Rachakonda [ 27/Nov/19 ]

Stack is failing after OVS 2.9.2 is installed.

 

https://jenkins.opendaylight.org/sandbox/job/srini-netvirt-csit-1node-0cmb-1ctl-2cmp-openstack-queens-upstream-stateful-neon/2/console

Comment by Jamo Luhrsen [ 27/Nov/19 ]

I don't think it ever even tries to run devstack. Looks like it's failing near the custom ovs install. here is the snippet
that caught my eye from the console log:

+ install_ovs_from_repo 10.30.170.102 file:/tmp/ovs_rpms
+ local -r ip=10.30.170.102
+ local -r rpm_repo=file:/tmp/ovs_rpms
+ echo 'Installing OVS from repo file:/tmp/ovs_rpms on 10.30.170.102 ...'
Installing OVS from repo file:/tmp/ovs_rpms on 10.30.170.102 ...
+ cat
+ scp /w/workspace/srini-netvirt-csit-1node-0cmb-1ctl-2cmp-openstack-queens-upstream-stateful-neon/install_ovs.sh 10.30.170.102:/tmp
Warning: Permanently added '10.30.170.102' (ECDSA) to the list of known hosts.
+ ssh -t -t 10.30.170.102 'bash /tmp/install_ovs.sh >> /tmp/install_ovs.txt 2>&1'
Warning: Permanently added '10.30.170.102' (ECDSA) to the list of known hosts.
Connection to 10.30.170.102 closed.
Build step 'Execute shell' marked build as failure

btw, I didn't see a stack.log file in the compute node logs, which is how I knew that devstack didn't run. look here.

Comment by Srinivas Rachakonda [ 28/Nov/19 ]

In the below log:

https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/srini-netvirt-csit-1node-0cmb-1ctl-2cmp-openstack-queens-upstream-stateful-neon/2/control_1/install_ovs.txt.gz

there is a message saying openvswitch installed.

Installed:
openvswitch.x86_64 0:2.9.2-1.el7
openvswitch-dkms.x86_64 0:2.9.2-1.el7
openvswitch-selinux-policy.noarch 0:2.9.2-1.el7

 

But there is an error for kernel not updated:

 

++ sudo modinfo -n openvswitch
+ NEW_MOD=/lib/modules/3.10.0-1062.1.2.el7.x86_64/kernel/net/openvswitch/openvswitch.ko.xz
+ '[' /lib/modules/3.10.0-1062.1.2.el7.x86_64/kernel/net/openvswitch/openvswitch.ko.xz '!=' /lib/modules/3.10.0-1062.1.2.el7.x86_64/kernel/net/openvswitch/openvswitch.ko.xz ']'
+ echo 'Kernel module was not updated'
Kernel module was not updated
+ exit 1

 

Please kindly suggest if kernel needs to be updated to a new version

Comment by Abhinav Gupta [ 28/Nov/19 ]

1. Why did we take 2.9.2? I checked that 2.9.5 is the latest with most bug fixes in 2.9.x track. Any specific reason?
2. Linux kernel range supported is 3.10 to 4.13. So 3.10 should work, however you can try upgrading the kernel to newer versions in this range.

Generated at Wed Feb 07 20:23:30 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.