[NETVIRT-551] CSIT Sporadic failures - First port on compute (DHCP Server MAC) has ELAN drop rule instead of output to tunnel Created: 20/Mar/17  Updated: 13/Apr/17  Resolved: 13/Apr/17

Status: Resolved
Project: netvirt
Component/s: General
Affects Version/s: Carbon
Fix Version/s: None

Type: Bug
Reporter: Alon Kochba Assignee: Periyasamy Palanisamy
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 8023

 Description   

https://jenkins.opendaylight.org/releng/job/netvirt-patch-test-current-boron/229/

The failure is getting DHCP server IP from the console log:
https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-newton-nodl-v2-gate-stateful-boron/119/archives/log.html.gz#s1-s1-s1-t7

This is just a side effect of metadata failing, causing the DHCP server IP output to be delayed.
The actual cause is the metadata server (DHCP server) MAC address on compute 2 in the ELAN pipeline has an output:drop rule, instead of outputting to the tunnel.

cookie=0x803138a, duration=337.546s, table=51, n_packets=220, n_bytes=15264, priority=20,metadata=0x138a000000/0xffff000000,dl_dst=fa:16:3e:70:f7:f0 actions=drop
cookie=0x803138b, duration=317.800s, table=51, n_packets=100, n_bytes=9592, priority=20,metadata=0x138b000000/0xffff000000,dl_dst=fa:16:3e:6b:e0:64 actions=set_field:0x2->tun_id,output:3

Note that on the same compute, there are other rules pointing to that tunnel, implying that only the first port on the compute suffers from this, probably relates to auto tunnel configuration.

(30.0.0.2 fa:16:3e:70:f7:f0 - fails)
(40.0.0.2 fa:16:3e:6b:e0:64 - works)

Note that pings also fail later from the DHCP to these VMs.



 Comments   
Comment by Jamo Luhrsen [ 20/Mar/17 ]

https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-newton-nodl-v2-upstream-learn-boron/193/archives/log.html.gz#s1-s1-s1-t7-k10-k2-k1-k3-k1-k15-k4

Comment by Jamo Luhrsen [ 20/Mar/17 ]

https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-newton-upstream-learn-boron/198/archives/log.html.gz#s1-s1-s1-t7-k10-k2-k1-k3-k1-k15-k4

Comment by Periyasamy Palanisamy [ 21/Mar/17 ]

With code review [1] and VMs are getting IPs properly with single CSIT run.
And ODL logs looks proper (confirmed with adding few more info level logs) and VM/tunnel interface state events are processed in correct order.
Let us run CSIT multiple times and confirm whether this issue is completely solved or not.

[1] https://git.opendaylight.org/gerrit/#/c/53588/

Comment by Jamo Luhrsen [ 10/Apr/17 ]

I think this came up again. I see this drop rule for a port in table 51:

table=51, n_packets=181, n_bytes=12338, priority=20,metadata=0x138b000000/0xffff000000,dl_dst=fa:16:3e:2e:46:90 actions=drop

direct link to flow dump output:

https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-newton-nodl-v2-upstream-stateful-boron/368/archives/log.html.gz#s1-s1-s1-t8-k17-k2-k1-k2-k1-k15-k4

Comment by Jamo Luhrsen [ 10/Apr/17 ]

(In reply to Jamo Luhrsen from comment #4)
> I think this came up again. I see this drop rule for a port in table 51:
>
> table=51, n_packets=181, n_bytes=12338,
> priority=20,metadata=0x138b000000/0xffff000000,dl_dst=fa:16:3e:2e:46:90
> actions=drop
>
> direct link to flow dump output:
>
> https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-
> newton-nodl-v2-upstream-stateful-boron/368/archives/log.html.gz#s1-s1-s1-t8-
> k17-k2-k1-k2-k1-k15-k4

maybe here too:

https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-newton-nodl-v2-upstream-transparent-boron/492/archives/log.html.gz#s1-s1-s1-t8-k17-k2-k1-k2-k1-k15

Comment by Periyasamy Palanisamy [ 11/Apr/17 ]

some fixes [1] and [2] for this bug got merged yesterday after these csit runs. Let us see the today's results.

[1] https://git.opendaylight.org/gerrit/#/c/53963/
[2] https://git.opendaylight.org/gerrit/#/c/53588/

Comment by Periyasamy Palanisamy [ 13/Apr/17 ]

The required fixes are merged in both master and boron.
And not seeing this issue for last 2 days, so making it as resolved.
We can reopen this bug if the issue is reproducible again.

Generated at Wed Feb 07 20:21:51 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.