[OPNFLWPLUG-1060] instance not able to curl metadata Created: 12/Dec/18 Updated: 19/Mar/19 Resolved: 19/Mar/19 |
|
| Status: | Resolved |
| Project: | OpenFlowPlugin |
| Component/s: | General, openflowplugin-impl |
| Affects Version/s: | None |
| Fix Version/s: | Fluorine-SR2, Neon |
| Type: | Bug | Priority: | High |
| Reporter: | Jamo Luhrsen | Assignee: | Somashekhar Javalagi |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | apex:gate, csit:failures | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Description |
|
once in aprox 75 jobs, the apex job we are trying to make stable so it can be a voting |
| Comments |
| Comment by Jamo Luhrsen [ 12/Dec/18 ] |
|
The job failed both the l2 and external suites for the same reason. In the l2 suite, 6 instances are booted and the karaf.log and neutron server log is attached. |
| Comment by Vishal Thapar [ 12/Dec/18 ] |
|
This looks like very old dhcp issue where VM couldn't contact metadata service. At the time it was failing to acquire dhcp address and we increased timeout to 3 minutes. Here, it seems to get DHCP IP but something else is missing and not able to contact metadata service [1]. Starting network... Route is setup correctly [2] default via 192.0.2.1 dev br-ex |
| Comment by Jamo Luhrsen [ 12/Dec/18 ] |
|
some of the .gz files in the logs I've/Vishal has given will be corrupt. Here is a better link to |
| Comment by Jamo Luhrsen [ 12/Dec/18 ] |
|
Thanks for poking around thapar. This is a short job (30-40m) so just pay attention to min:sec anyway, looking at the meta-data log in the dhcp log you can see that 21.0.0.24 was allocated there. nothing stands out to me in the neutron server log around Anyway, I didn't find anything more to go on |
| Comment by Jamo Luhrsen [ 13/Dec/18 ] |
|
here is another example of this bug: |
| Comment by Vishal Thapar [ 13/Dec/18 ] |
|
Second job also has exact same issue. Can we log console log for scenario where it does work? May increase logging but will help confirm if the route add error is relevant or not. Issue is VM not able to reach metadata agent. Most likely a routing/arp issue. Can we capture arp table also along with console log, not sure if possible. Also found this bug https://jira.opnfv.org/browse/APEX-593 though in this case it goes through but just takes longer. In our case instance-id is never acquired, so suspect some route/arp for metadata is not resolved. Need someone more familiar with metadata and how networking for it works to figure out what is going wrong. |
| Comment by Tim Rozet [ 13/Dec/18 ] |
|
After looking through the logs, the metadata agent does not receive the curl from the failed instance. It does however, receive the requests from the instance before and after it on the same network. This looks like an openflow problem, but looking at the flows I don't see an issue. I'm running this in an infinite loop to recreate the same network, subnet, security group, and instance on my machine to see if I can reproduce it. |
| Comment by Jamo Luhrsen [ 13/Dec/18 ] |
|
yeah, trozet, that's what I noticed and linked to in an earlier comment about the I can also try to run some extra loops/instances in the csit and let jenkins run |
| Comment by Jamo Luhrsen [ 13/Dec/18 ] |
| Comment by Tim Rozet [ 14/Dec/18 ] |
|
I was able to reproduce the problem after 120 iterations. After metadata failed, I was unable to even ping or ssh into the cirros instance (but it did DHCP successfully). The ping request was making it to the instance but the reply was not making it back, and there was no entry in conntrack. Thanks to aswins for helping me debug the flows, he noticed that the ping request was actually bypassing conntrack. The ping request should end up in table 220, then piped to 239 to create a conntrack session. However this flow was directly sending it to the VM in table 220: cookie=0x8000007, duration=48738.948s, table=220, n_packets=174, n_bytes=14406, priority=9,reg6=0x20300 actions=output:"tap4386be41-ba" instead of hitting this flow (due to priority): cookie=0x6900000, duration=48738.948s, table=220, n_packets=0, n_bytes=0, priority=6,reg6=0x20300 actions=load:0x90020300->NXM_NX_REG6[],write_metadata:0/0xfffffffffe,goto_table:239
Therefore when the packets came back from the VM, conntrack was dropping them because there was no session: #compute: recirc_id(0),in_port(4),eth(src=fa:16:3e:8a:4e:0a,dst=fa:16:3e:58:31:2b),eth_type(0x8100),vlan(vid=1121,pcp=0),encap(eth_type(0x0800),ipv4(src=0.0.0.0/128.0.0.0,dst=0.0.0.0/128.0.0.0,frag=no)), packets:10, bytes:1020, used:0.118s, actions:pop_vlan,5 recirc_id(0),in_port(4),eth(src=fa:16:3e:8a:4e:0a,dst=fa:16:3e:58:31:2b),eth_type(0x8100),vlan(vid=1121,pcp=0),encap(eth_type(0x0806),arp(sip=21.0.0.2,tip=21.0.0.19,op=2/0xff)), packets:0, bytes:0, used:never, actions:pop_vlan,userspace(pid=4294963155,controller(reason=1,dont_send=0,continuation=0,recirc_id=864,rule_cookie=0x822002e,controller_id=0,max_len=65535)),5 recirc_id(0),in_port(5),eth(src=fa:16:3e:58:31:2b,dst=fa:16:3e:8a:4e:0a),eth_type(0x0806),arp(sip=21.0.0.19,tip=21.0.0.2,op=1/0xff,sha=fa:16:3e:58:31:2b), packets:0, bytes:0, used:never, actions:userspace(pid=4286503971,controller(reason=4,dont_send=0,continuation=0,recirc_id=863,rule_cookie=0x822002d,controller_id=0,max_len=65535)),push_vlan(vid=1121,pcp=0),4 recirc_id(0),in_port(5),eth(src=fa:16:3e:58:31:2b),eth_type(0x0800),ipv4(src=21.0.0.19,proto=1,frag=no), packets:10, bytes:980, used:0.118s, actions:ct(zone=5128),recirc(0x35e) recirc_id(0x35e),in_port(5),ct_state(-new-est-rel+inv+trk),ct_mark(0/0x1),eth_type(0x0800),ipv4(frag=no), packets:10, bytes:980, used:0.118s, actions:drop [root@overcloud-novacompute-0 ~]# ovs-dpctl show system@ovs-system: lookups: hit:34958088 missed:721437 lost:0 flows: 21 masks: hit:296264755 total:18 hit/pkt:8.30 port 0: ovs-system (internal) port 1: vxlan_sys_4789 (vxlan: packet_type=ptap) port 2: br-int (internal) port 3: br-ex (internal) port 4: eth0 port 5: tap4386be41-ba
I then deleted that flow, and I could then ssh into the cirros machine. Then I tried to curl metadata and it worked. I then needed to figure out if the same flow was causing issues for metadata. From there I created a cron job in cirros to curl for metadata every minute. I could see it hitting the metadata agent. I then added the problem flow back, and metadata curls were not coming into the server anymore. Therefore I believe this flow is the same root cause as the ping issue and the reason for this bug. In the CI logs I see that the flow that should go from 220 to 239 is missing, and the problem flow exists in that log (two times?): cookie=0x8000007, duration=272.526s, table=220, n_packets=0, n_bytes=0, priority=10,reg6=0xd00,metadata=0x1/0x1 actions=drop cookie=0x8000007, duration=272.522s, table=220, n_packets=0, n_bytes=0, priority=10,reg6=0xc00,metadata=0x1/0x1 actions=drop cookie=0x8000007, duration=721.965s, table=220, n_packets=118, n_bytes=11146, priority=9,reg6=0x400 actions=output:tuna642e00a5c1 cookie=0x8000007, duration=721.965s, table=220, n_packets=10, n_bytes=448, priority=9,reg6=0x300 actions=output:tun3884a6aa6dc cookie=0x8000007, duration=272.527s, table=220, n_packets=170, n_bytes=14940, priority=9,reg6=0xd00 actions=push_vlan:0x8100,set_field:5217->vlan_vid,output:"br-datacentre-" cookie=0x8000007, duration=272.522s, table=220, n_packets=0, n_bytes=0, priority=9,reg6=0xc00 actions=output:"br-datacentre-" cookie=0x6900000, duration=272.470s, table=220, n_packets=107, n_bytes=11218, priority=6,reg6=0xb00 actions=load:0x90000b00->NXM_NX_REG6[],write_metadata:0/0xfffffffffe,goto_table:239 cookie=0x8000007, duration=272.470s, table=220, n_packets=101, n_bytes=10494, priority=9,reg6=0x90000b00 actions=output:"tapd419615e-50"
*cookie=0x8000007, duration=267.979s, table=220, n_packets=0, n_bytes=0, priority=9,reg6=0x90000e00 actions=output:"tap8b015b1e-82" cookie=0x8000007, duration=267.979s, table=220, n_packets=170, n_bytes=13918, priority=9,reg6=0xe00 actions=output:"tap8b015b1e-82"*
cookie=0x6900000, duration=248.137s, table=220, n_packets=107, n_bytes=10856, priority=6,reg6=0x1200 actions=load:0x90001200->NXM_NX_REG6[],write_metadata:0/0xfffffffffe,goto_table:239 cookie=0x8000007, duration=248.137s, table=220, n_packets=101, n_bytes=10436, priority=9,reg6=0x90001200 actions=output:"tap2559ce3e-48" cookie=0x6900000, duration=721.965s, table=239, n_packets=154, n_bytes=19010, priority=100,ip actions=ct_clear,goto_table:240 cookie=0x6900000, duration=721.965s, table=239, n_packets=16, n_bytes=1168, priority=100,ipv6 actions=ct_clear,goto_table:240 cookie=0x6900000, duration=721.965s, table=239, n_packets=44, n_bytes=1896, priority=0 actions=goto_table:240
I'm not sure why the there is no flow to match on reg6, 0xe00 or 0x90000e00 and goto table 239. Could be timing with when the flow dumps are collected? Either way in my reproduction environment there is a bug. We can fix that and see if the problem goes away in CI. Will attach karaf log. Flow and other info here:
|
| Comment by Aswin Suryanarayanan [ 14/Dec/18 ] |
|
*cookie=0x8000007, duration=267.979s, table=220, n_packets=0, n_bytes=0, priority=9,reg6=0x90000e00 actions=output:"tap8b015b1e-82" cookie=0x8000007, duration=267.979s, table=220, n_packets=170, n_bytes=13918, priority=9,reg6=0xe00 actions=output:"tap8b015b1e-82"* Here the second flow should not have been there. Instead of that there should be a flow that sends packet to table=239 should be there matching reg6=0xe00. The second flow where it directly sends the packet to output port from table 220 is valid for ports for which SG is disabled. Usually the neutron port create happens in this order.(this is when I debugged an older release hope this is still the case.) 1)Port with security group disabled is created. (the second flow here will be programmed.) In this case the openflowplugin may be writing flows out of sync. |
| Comment by Jamo Luhrsen [ 14/Dec/18 ] |
|
ok, looks like we're narrowing in on the problem. Thanks guys. Here are two table=220 flow dumps from PASSING CASE: 153: cookie=0x8000007, duration=122.815s, table=220, n_packets=0, n_bytes=0, priority=10,reg6=0xc00,metadata=0x1/0x1 actions=drop 154: cookie=0x8000007, duration=122.801s, table=220, n_packets=0, n_bytes=0, priority=10,reg6=0xd00,metadata=0x1/0x1 actions=drop 155: cookie=0x8000007, duration=582.334s, table=220, n_packets=16, n_bytes=1384, priority=9,reg6=0x300 actions=output:tun3884a6aa6dc 156: cookie=0x8000007, duration=582.334s, table=220, n_packets=117, n_bytes=11104, priority=9,reg6=0x400 actions=output:tuna642e00a5c1 157: cookie=0x8000007, duration=122.815s, table=220, n_packets=0, n_bytes=0, priority=9,reg6=0xc00 actions=output:"br-datacentre-" 158: cookie=0x8000007, duration=122.801s, table=220, n_packets=236, n_bytes=22292, priority=9,reg6=0xd00 actions=push_vlan:0x8100,set_field:5217->vlan_vid,output:"br-datacentre-" 159: cookie=0x6900000, duration=122.695s, table=220, n_packets=107, n_bytes=11218, priority=6,reg6=0xb00 actions=load:0x90000b00->NXM_NX_REG6[],write_metadata:0/0xfffffffffe,goto_table:239 160: cookie=0x8000007, duration=122.688s, table=220, n_packets=101, n_bytes=10494, priority=9,reg6=0x90000b00 actions=output:"tapa92fefa9-ca" 161: cookie=0x8000007, duration=118.411s, table=220, n_packets=101, n_bytes=10494, priority=9,reg6=0x90000e00 actions=output:"tap3c18abac-bd" 162: cookie=0x6900000, duration=118.397s, table=220, n_packets=107, n_bytes=11218, priority=6,reg6=0xe00 actions=load:0x90000e00->NXM_NX_REG6[],write_metadata:0/0xfffffffffe,goto_table:239 163: cookie=0x8000007, duration=100.575s, table=220, n_packets=100, n_bytes=10394, priority=9,reg6=0x90001200 actions=output:"tap80f684f4-60" 164: cookie=0x6900000, duration=100.575s, table=220, n_packets=106, n_bytes=10814, priority=6,reg6=0x1200 actions=load:0x90001200->NXM_NX_REG6[],write_metadata:0/0xfffffffffe,goto_table:239 FAILING CASE: 153: cookie=0x8000007, duration=263.348s, table=220, n_packets=0, n_bytes=0, priority=10,reg6=0xc00,metadata=0x1/0x1 actions=drop 154: cookie=0x8000007, duration=263.348s, table=220, n_packets=0, n_bytes=0, priority=10,reg6=0xd00,metadata=0x1/0x1 actions=drop 155: cookie=0x8000007, duration=691.042s, table=220, n_packets=16, n_bytes=1384, priority=9,reg6=0x300 actions=output:tun3884a6aa6dc 156: cookie=0x8000007, duration=691.042s, table=220, n_packets=117, n_bytes=11104, priority=9,reg6=0x400 actions=output:tuna642e00a5c1 157: cookie=0x8000007, duration=263.348s, table=220, n_packets=0, n_bytes=0, priority=9,reg6=0xc00 actions=output:"br-datacentre-" 158: cookie=0x8000007, duration=263.348s, table=220, n_packets=150, n_bytes=13842, priority=9,reg6=0xd00 actions=push_vlan:0x8100,set_field:5217->vlan_vid,output:"br-datacentre-" 159: cookie=0x8000007, duration=263.336s, table=220, n_packets=92, n_bytes=10116, priority=9,reg6=0x90000b00 actions=output:"tapde1ea417-33" 160: cookie=0x6900000, duration=263.332s, table=220, n_packets=98, n_bytes=10840, priority=6,reg6=0xb00 actions=load:0x90000b00->NXM_NX_REG6[],write_metadata:0/0xfffffffffe,goto_table:239 161: cookie=0x8000007, duration=252.629s, table=220, n_packets=0, n_bytes=0, priority=9,reg6=0x90000e00 actions=output:"tap0dd6c327-54" 162: cookie=0x8000007, duration=252.629s, table=220, n_packets=157, n_bytes=12997, priority=9,reg6=0xe00 actions=output:"tap0dd6c327-54" 163: cookie=0x8000007, duration=240.479s, table=220, n_packets=100, n_bytes=10394, priority=9,reg6=0x90001200 actions=output:"tap04bf4472-dd" 164: cookie=0x6900000, duration=240.479s, table=220, n_packets=106, n_bytes=10814, priority=6,reg6=0x1200 actions=load:0x90001200->NXM_NX_REG6[],write_metadata:0/0xfffffffffe,goto_table:239 karaf.log from the above failing case. |
| Comment by Jamo Luhrsen [ 15/Dec/18 ] |
|
Reproduced with TRACE debugging for openflowplugin. |
| Comment by Vishal Thapar [ 17/Dec/18 ] |
|
aswins Which code is responsible for replacing the flow? Will give an idea for markers in case any exceptions in netvirt code. |
| Comment by Aswin Suryanarayanan [ 17/Dec/18 ] |
|
First the interface will be created with out port security later Acl service will update the binding From the logs order in which OF plugin revives the flow programming request seems to be the issue. In the failing case the the flow mod with updated regex after acl service is bound is received before the default bind. thapar Should this be an issue in genius? Failing Case 2018-12-15T01:57:27,014 | DEBUG | nc-exe-26 | FrmUtil | 369 - org.opendaylight.openflowplugin.applications.forwardingrules-manager - 0.8.0.SNAPSHOT | Check if flow Flow{getCookie=FlowCookie{_value=134217735}, getFlowName=default.a992567a-ae6c-4acf-9b6c-fda693ac6f18, getHardTimeout=0, getId=Uri{_value=75753836899984.220.a992567a-ae6c-4acf-9b6c-fda693ac6f18.9}, getIdleTimeout=0, getInstructions=Instructions{getInstruction=[Instruction{getInstruction=ApplyActionsCase{getApplyActions=ApplyActions{getAction=[Action{getAction=OutputActionCase{getOutputAction=OutputAction{getMaxLength=0, getOutputNodeConnector=Uri{_value=12}, augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}], augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}], augmentations={}}, getMatch=Match{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.general.rev140714.GeneralAugMatchNodesNodeTableFlow=GeneralAugMatchNodesNodeTableFlow{getExtensionList=[ExtensionList{getExtension=Extension{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxAugMatchNodesNodeTableFlow=NxAugMatchNodesNodeTableFlow{getNxmNxReg=NxmNxReg{getReg=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowjava.nx.match.rev140421.NxmNxReg6, getValue=*2415926784*, augmentations={}}}}}, getExtensionKey=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxmNxReg6Key, augmentations={}}]}}}, getPriority=9, getTableId=220, isBarrier=false, isInstallHw=true, isStrict=true, augmentations={}} is dependent on group 2018-12-15T01:57:27,024 | DEBUG | nc-exe-30 | FrmUtil | 369 - org.opendaylight.openflowplugin.applications.forwardingrules-manager - 0.8.0.SNAPSHOT | Check if flow Flow{getCookie=FlowCookie{_value=134217735}, getFlowName=default.a992567a-ae6c-4acf-9b6c-fda693ac6f18, getHardTimeout=0, getId=Uri{_value=75753836899984.220.a992567a-ae6c-4acf-9b6c-fda693ac6f18.0}, getIdleTimeout=0, getInstructions=Instructions{getInstruction=[Instruction{getInstruction=ApplyActionsCase{getApplyActions=ApplyActions{getAction=[Action{getAction=OutputActionCase{getOutputAction=OutputAction{getMaxLength=0, getOutputNodeConnector=Uri{_value=12}, augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}], augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}], augmentations={}}, getMatch=Match{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.general.rev140714.GeneralAugMatchNodesNodeTableFlow=GeneralAugMatchNodesNodeTableFlow{getExtensionList=[ExtensionList{getExtension=Extension{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxAugMatchNodesNodeTableFlow=NxAugMatchNodesNodeTableFlow{getNxmNxReg=NxmNxReg{getReg=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowjava.nx.match.rev140421.NxmNxReg6, getValue=*7680*, augmentations={}}}}}, getExtensionKey=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxmNxReg6Key, augmentations={}}]}}}, getPriority=9, getTableId=220, isBarrier=false, isInstallHw=true, isStrict=true, augmentations={}} is dependent on group Success Case 2018-12-15T01:47:51,184 | DEBUG | nc-exe-20 | FrmUtil | 369 - org.opendaylight.openflowplugin.applications.forwardingrules-manager - 0.8.0.SNAPSHOT | Check if flow Flow{getCookie=FlowCookie{_value=134217735}, getFlowName=default.d2cf928a-1c67-48f3-ba78-490534828239, getHardTimeout=0, getId=Uri{_value=260122599241271.220.d2cf928a-1c67-48f3-ba78-490534828239.0}, getIdleTimeout=0, getInstructions=Instructions{getInstruction=[Instruction{getInstruction=ApplyActionsCase{getApplyActions=ApplyActions{getAction=[Action{getAction=OutputActionCase{getOutputAction=OutputAction{getMaxLength=0, getOutputNodeConnector=Uri{_value=8}, augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}], augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}], augmentations={}}, getMatch=Match{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.general.rev140714.GeneralAugMatchNodesNodeTableFlow=GeneralAugMatchNodesNodeTableFlow{getExtensionList=[ExtensionList{getExtension=Extension{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxAugMatchNodesNodeTableFlow=NxAugMatchNodesNodeTableFlow{getNxmNxReg=NxmNxReg{getReg=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowjava.nx.match.rev140421.NxmNxReg6, getValue=*2816*, augmentations={}}}}}, getExtensionKey=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxmNxReg6Key, augmentations={}}]}}}, getPriority=9, getTableId=220, isBarrier=false, isInstallHw=true, isStrict=true, augmentations={}} is dependent on group 2018-12-15T01:47:51,207 | DEBUG | nc-exe-20 | FrmUtil | 369 - org.opendaylight.openflowplugin.applications.forwardingrules-manager - 0.8.0.SNAPSHOT | Check if flow Flow{getCookie=FlowCookie{_value=134217735}, getFlowName=default.d2cf928a-1c67-48f3-ba78-490534828239, getHardTimeout=0, getId=Uri{_value=260122599241271.220.d2cf928a-1c67-48f3-ba78-490534828239.9}, getIdleTimeout=0, getInstructions=Instructions{getInstruction=[Instruction{getInstruction=ApplyActionsCase{getApplyActions=ApplyActions{getAction=[Action{getAction=OutputActionCase{getOutputAction=OutputAction{getMaxLength=0, getOutputNodeConnector=Uri{_value=8}, augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}], augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}], augmentations={}}, getMatch=Match{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.general.rev140714.GeneralAugMatchNodesNodeTableFlow=GeneralAugMatchNodesNodeTableFlow{getExtensionList=[ExtensionList{getExtension=Extension{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxAugMatchNodesNodeTableFlow=NxAugMatchNodesNodeTableFlow{getNxmNxReg=NxmNxReg{getReg=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowjava.nx.match.rev140421.NxmNxReg6, getValue=*2415921920*, augmentations={}}}}}, getExtensionKey=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxmNxReg6Key, augmentations={}}]}}}, getPriority=9, getTableId=220, isBarrier=false, isInstallHw=true, isStrict=true, augmentations={}} is dependent on group |
| Comment by Vishal Thapar [ 18/Dec/18 ] |
|
Could be. We should also look at genius servicebinding logs to see if genius is getting service bindings in correct order or not. k.faseela The flow with output to port is the default binding created by IFM, right? |
| Comment by Faseela K [ 18/Dec/18 ] |
|
thapar : Yes! |
| Comment by Vishal Thapar [ 18/Dec/18 ] |
|
k.faseela Looks like aswins had right idea but used wrong log entries to explain point. Issue is the default action with output to port is being added AFTER the action to go to Table239 in failure case. SUCCESS 2018-12-15T01:47:51,184 | DEBUG | nc-exe-20 | FrmUtil | 369 - org.opendaylight.openflowplugin.applications.forwardingrules-manager - 0.8.0.SNAPSHOT | Check if flow Flow{getCookie=FlowCookie{_value=134217735}, getFlowName=default.d2cf928a-1c67-48f3-ba78-490534828239, getHardTimeout=0, getId=Uri{_value=260122599241271.220.d2cf928a-1c67-48f3-ba78-490534828239.0}, getIdleTimeout=0, getInstructions=Instructions{getInstruction=[Instruction{getInstruction=ApplyActionsCase{getApplyActions=ApplyActions{getAction=[Action{getAction=OutputActionCase{getOutputAction=OutputAction{getMaxLength=0, *getOutputNodeConnector=Uri{_value=8}*, augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}], augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}], augmentations={}}, getMatch=Match{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.general.rev140714.GeneralAugMatchNodesNodeTableFlow=GeneralAugMatchNodesNodeTableFlow{getExtensionList=[ExtensionList{getExtension=Extension{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxAugMatchNodesNodeTableFlow=NxAugMatchNodesNodeTableFlow{getNxmNxReg=NxmNxReg{getReg=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowjava.nx.match.rev140421.NxmNxReg6, getValue=*2816*, augmentations={}}}}}, getExtensionKey=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxmNxReg6Key, augmentations={}}]}}}, getPriority=9, getTableId=220, isBarrier=false, isInstallHw=true, isStrict=true, augmentations={}} is dependent on group FAILURE 2018-12-15T01:57:27,016 | DEBUG | nc-exe-26 | FrmUtil | 369 - org.opendaylight.openflowplugin.applications.forwardingrules-manager - 0.8.0.SNAPSHOT | Check if flow Flow{getCookie=FlowCookie{_value=110100480}, getFlowName=acl.egressacl.a992567a-ae6c-4acf-9b6c-fda693ac6f18, getHardTimeout=0, getId=Uri{_value=75753836899984.220.a992567a-ae6c-4acf-9b6c-fda693ac6f18.0}, getIdleTimeout=0, getInstructions=Instructions{getInstruction=[Instruction{getInstruction=WriteMetadataCase{getWriteMetadata=WriteMetadata{getMetadata=0, getMetadataMask=1099511627774, augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}, Instruction{getInstruction=GoToTableCase{getGoToTable=GoToTable{getTableId=239, augmentations={}}, augmentations={}}, getOrder=1, augmentations={}}, Instruction{getInstruction=ApplyActionsCase{getApplyActions=ApplyActions{getAction=[Action{getAction=NxActionRegLoadNodesNodeTableFlowApplyActionsCase{getNxRegLoad=NxRegLoad{getDst=Dst{getDstChoice=DstNxRegCase{getNxReg=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowjava.nx.match.rev140421.NxmNxReg6, augmentations={}}, getEnd=31, getStart=0, augmentations={}}, getValue=2415926784, augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}], augmentations={}}, augmentations={}}, getOrder=2, augmentations={}}], augmentations={}}, getMatch=Match{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.general.rev140714.GeneralAugMatchNodesNodeTableFlow=GeneralAugMatchNodesNodeTableFlow{getExtensionList=[ExtensionList{getExtension=Extension{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxAugMatchNodesNodeTableFlow=NxAugMatchNodesNodeTableFlow{getNxmNxReg=NxmNxReg{getReg=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowjava.nx.match.rev140421.NxmNxReg6, getValue=*7680*, augmentations={}}}}}, getExtensionKey=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxmNxReg6Key, augmentations={}}]}}}, getPriority=6, getTableId=220, isBarrier=false, isInstallHw=true, isStrict=true, augmentations={}} is dependent on group |
| Comment by Faseela K [ 18/Dec/18 ] |
|
thapar : Paste the flow on switch in working and non-working case? |
| Comment by Aswin Suryanarayanan [ 18/Dec/18 ] |
|
Not Working cookie=0x6900000, duration=248.818s, table=220, n_packets=0, n_bytes=0, priority=6,reg6=0x1e00 actions=load:0x90001e00->NXM_NX_REG6[],write_metadata:0/0xfffffffffe,goto_table:239 Success cookie=0x6900000, duration=266.575s, table=220, n_packets=99, n_bytes=9638, priority=6,reg6=0x1b00 actions=load:0x90001b00->NXM_NX_REG6[],write_metadata:0/0xfffffffffe,goto_table:239 More details at [1] [1]https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/builder-copy-sandbox-logs/571/jamo-netvirt-1538-netvirt-csit-1node-0cmb-1ctl-2cmp-apex-queens-gate-snat-conntrack-oxygen/5/robot-plugin/log_02_l2.html.gz#s1-k2-k1-k1 |
| Comment by Faseela K [ 18/Dec/18 ] |
|
thapar Please supply a run with TRACE enabled for org.opendaylight.genius.interfacemanager |
| Comment by Vishal Thapar [ 18/Dec/18 ] |
|
Trace logs for failed port in 220: 2018-12-15T01:57:27,014 | DEBUG | nc-exe-26 | FrmUtil | 369 - org.opendaylight.openflowplugin.applications.forwardingrules-manager - 0.8.0.SNAPSHOT | Check if flow Flow{getCookie=FlowCookie{_value=134217735}, getFlowName=default.a992567a-ae6c-4acf-9b6c-fda693ac6f18, getHardTimeout=0, getId=Uri{_value=75753836899984.220.a992567a-ae6c-4acf-9b6c-fda693ac6f18.9}, getIdleTimeout=0, getInstructions=Instructions{getInstruction=[Instruction{getInstruction=ApplyActionsCase{getApplyActions=ApplyActions{getAction=[Action{getAction=OutputActionCase{getOutputAction=OutputAction{getMaxLength=0, getOutputNodeConnector=Uri{_value=12}, augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}], augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}], augmentations={}}, getMatch=Match{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.general.rev140714.GeneralAugMatchNodesNodeTableFlow=GeneralAugMatchNodesNodeTableFlow{getExtensionList=[ExtensionList{getExtension=Extension{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxAugMatchNodesNodeTableFlow=NxAugMatchNodesNodeTableFlow{getNxmNxReg=NxmNxReg{getReg=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowjava.nx.match.rev140421.NxmNxReg6, getValue=2415926784, augmentations={}}}}}, getExtensionKey=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxmNxReg6Key, augmentations={}}]}}}, getPriority=9, getTableId=220, isBarrier=false, isInstallHw=true, isStrict=true, augmentations={}} is dependent on group Flows in OVS [compute1]: cookie=0x8000007, duration=248.821s, table=220, n_packets=0, n_bytes=0, priority=9,reg6=0x90001e00 actions=output:"tapa992567a-ae" |
| Comment by Jamo Luhrsen [ 18/Dec/18 ] |
|
I will work on this today. I'll just enable this module for TRACE and let it run. Keep in mind, k.faseela, that |
| Comment by Jamo Luhrsen [ 18/Dec/18 ] |
|
this job is where I'm trying to get this |
| Comment by Faseela K [ 19/Dec/18 ] |
|
d.arunprakash@ericsson.com : We need your help in debugging this from openflowplugin side.. |
| Comment by Arunprakash D [ 19/Dec/18 ] |
|
Somashekhar is checking this. |
| Comment by Somashekhar Javalagi [ 19/Dec/18 ] |
|
I checked the flow programmed by openflowplugin. For table=220 and reg6=0x1e00, found below flow-mods received by plugin in ovs-vswitchd logs, 2018-12-15T01:57:27.289Z|02466|vconn|DBG|tcp:192.0.2.19:6653: received: OFPT_FLOW_MOD (OF1.3) (xid=0x242): DEL_STRICT table:220 priority=9,reg6=0x1e00 actions=output:12 2018-12-15T01:57:27.290Z|02468|vconn|DBG|tcp:192.0.2.19:6653: received: OFPT_FLOW_MOD (OF1.3) (xid=0x244): ADD table:220 priority=6,reg6=0x1e00 cookie:0x6900000 actions=load:0x90001e00->NXM_NX_REG6[],write_metadata:0/0xfffffffffe,goto_table:239 2018-12-15T01:57:27.291Z|02471|vconn|DBG|tcp:192.0.2.19:6653: received: OFPT_FLOW_MOD (OF1.3) (xid=0x247): ADD table:220 priority=9,reg6=0x1e00 cookie:0x8000007 actions=output:12 2018-12-15T02:02:08.942Z|03073|vconn|DBG|tcp:192.0.2.19:6653: received: OFPT_FLOW_MOD (OF1.3) (xid=0x310): DEL_STRICT table:220 priority=6,reg6=0x1e00 actions=load:0x90001e00->NXM_NX_REG6[],write_metadata:0/0xfffffffffe,goto_table:239 2018-12-15T02:02:08.942Z|03074|vconn|DBG|tcp:192.0.2.19:6653: received: OFPT_FLOW_MOD (OF1.3) (xid=0x311): ADD table:220 priority=9,reg6=0x1e00 cookie:0x8000007 actions=output:12
For the corresponding flow-mods, there are respective changes pushed by the plugin in karaf logs. 2018-12-15T01:57:27,016 | DEBUG | nc-exe-26 | FrmUtil | 369 - org.opendaylight.openflowplugin.applications.forwardingrules-manager - 0.8.0.SNAPSHOT | Check if flow Flow{getCookie=FlowCookie{_value=110100480}, getFlowName=acl.egressacl.a992567a-ae6c-4acf-9b6c-fda693ac6f18, getHardTimeout=0, getId=Uri{_value=75753836899984.220.a992567a-ae6c-4acf-9b6c-fda693ac6f18.0}, getIdleTimeout=0, getInstructions=Instructions{getInstruction=[Instruction{getInstruction=WriteMetadataCase{getWriteMetadata=WriteMetadata{getMetadata=0, getMetadataMask=1099511627774, augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}, Instruction{getInstruction=GoToTableCase{getGoToTable=GoToTable{getTableId=239, augmentations={}}, augmentations={}}, getOrder=1, augmentations={}}, Instruction{getInstruction=ApplyActionsCase{getApplyActions=ApplyActions{getAction=[Action{getAction=NxActionRegLoadNodesNodeTableFlowApplyActionsCase{getNxRegLoad=NxRegLoad{getDst=Dst{getDstChoice=DstNxRegCase{getNxReg=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowjava.nx.match.rev140421.NxmNxReg6, augmentations={}}, getEnd=31, getStart=0, augmentations={}}, getValue=2415926784, augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}], augmentations={}}, augmentations={}}, getOrder=2, augmentations={}}], augmentations={}}, getMatch=Match{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.general.rev140714.GeneralAugMatchNodesNodeTableFlow=GeneralAugMatchNodesNodeTableFlow{getExtensionList=[ExtensionList{getExtension=Extension{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxAugMatchNodesNodeTableFlow=NxAugMatchNodesNodeTableFlow{getNxmNxReg=NxmNxReg{getReg=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowjava.nx.match.rev140421.NxmNxReg6, getValue=7680, augmentations={}}}}}, getExtensionKey=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxmNxReg6Key, augmentations={}}]}}}, getPriority=6, getTableId=220, isBarrier=false, isInstallHw=true, isStrict=true, augmentations={}} is dependent on group 2018-12-15T01:57:27,024 | DEBUG | nc-exe-30 | FrmUtil | 369 - org.opendaylight.openflowplugin.applications.forwardingrules-manager - 0.8.0.SNAPSHOT | Check if flow Flow{getCookie=FlowCookie{_value=134217735}, getFlowName=default.a992567a-ae6c-4acf-9b6c-fda693ac6f18, getHardTimeout=0, getId=Uri{_value=75753836899984.220.a992567a-ae6c-4acf-9b6c-fda693ac6f18.0}, getIdleTimeout=0, getInstructions=Instructions{getInstruction=[Instruction{getInstruction=ApplyActionsCase{getApplyActions=ApplyActions{getAction=[Action{getAction=OutputActionCase{getOutputAction=OutputAction{getMaxLength=0, getOutputNodeConnector=Uri{_value=12}, augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}], augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}], augmentations={}}, getMatch=Match{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.general.rev140714.GeneralAugMatchNodesNodeTableFlow=GeneralAugMatchNodesNodeTableFlow{getExtensionList=[ExtensionList{getExtension=Extension{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxAugMatchNodesNodeTableFlow=NxAugMatchNodesNodeTableFlow{getNxmNxReg=NxmNxReg{getReg=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowjava.nx.match.rev140421.NxmNxReg6, getValue=7680, augmentations={}}}}}, getExtensionKey=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxmNxReg6Key, augmentations={}}]}}}, getPriority=9, getTableId=220, isBarrier=false, isInstallHw=true, isStrict=true, augmentations={}} is dependent on group 2018-12-15T02:02:08,667 | DEBUG | nc-exe-31 | FrmUtil | 369 - org.opendaylight.openflowplugin.applications.forwardingrules-manager - 0.8.0.SNAPSHOT | Check if flow Flow{getCookie=FlowCookie{_value=134217735}, getFlowName=default.a992567a-ae6c-4acf-9b6c-fda693ac6f18, getHardTimeout=0, getId=Uri{_value=75753836899984.220.a992567a-ae6c-4acf-9b6c-fda693ac6f18.0}, getIdleTimeout=0, getInstructions=Instructions{getInstruction=[Instruction{getInstruction=ApplyActionsCase{getApplyActions=ApplyActions{getAction=[Action{getAction=OutputActionCase{getOutputAction=OutputAction{getMaxLength=0, getOutputNodeConnector=Uri{_value=12}, augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}], augmentations={}}, augmentations={}}, getOrder=0, augmentations={}}], augmentations={}}, getMatch=Match{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.general.rev140714.GeneralAugMatchNodesNodeTableFlow=GeneralAugMatchNodesNodeTableFlow{getExtensionList=[ExtensionList{getExtension=Extension{augmentations={interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxAugMatchNodesNodeTableFlow=NxAugMatchNodesNodeTableFlow{getNxmNxReg=NxmNxReg{getReg=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowjava.nx.match.rev140421.NxmNxReg6, getValue=7680, augmentations={}}}}}, getExtensionKey=interface org.opendaylight.yang.gen.v1.urn.opendaylight.openflowplugin.extension.nicira.match.rev140714.NxmNxReg6Key, augmentations={}}]}}}, getPriority=9, getTableId=220, isBarrier=false, isInstallHw=true, isStrict=true, augmentations={}} is dependent on group These logs are for flow-add events. We don't have any logs in code for flow delete events.
|
| Comment by Faseela K [ 20/Dec/18 ] |
|
so what does that mean now? delete is not pushed to the switch? Did you check the output of config/inventory? we see only two flows there, but there are 3 flows on the switch. What would you need to further debug the issue? |
| Comment by Somashekhar Javalagi [ 20/Dec/18 ] |
|
We are not logging any message after deleting the flows. The config inventory is captured at the teardown state after which some flows programmed for table 220. thapar can I get config inventory dump after all the tests? |
| Comment by Jamo Luhrsen [ 20/Dec/18 ] |
|
k.faseela, is the config/opendaylight-inventory:nodes model dump what you want? |
| Comment by Jamo Luhrsen [ 20/Dec/18 ] |
|
I'm having some trouble. 1) even though I have org.opendaylight.genius.interfacemanager set to TRACE in the logging config file 2) I cannot hit this bug right now. this sandbox job has run for 80+
|
| Comment by Faseela K [ 21/Dec/18 ] |
|
Yes Jamo. What Som is saying is the current teardown dump is captured before the log entry for flows is showing up in the karaf log. So he is suspecting if the teardown was captured at a letr point of time, inventory dump would have had 3 flows |
| Comment by Faseela K [ 21/Dec/18 ] |
|
Looks like you should set only genius.interfacemanager, the config file now says log4j2.logger.org_opendaylight_org_opendaylight_genius_interfacemanager.name = org.opendaylight.org.opendaylight.genius.interfacemanager which is wrong. |
| Comment by Somashekhar Javalagi [ 21/Dec/18 ] |
|
We are receiving add flow event with flow of priority 9 and then we are receiving update flow event for flow with priority 9 to replace it with flow with priority 6. So eventually we have to do, ADD event: ADD flow priority = 9 UPDATE event : DELETE flow priority = 9 ADD flow priority = 6.
But the UPDATE event is processed first before ADD event. Which is first trying to DELETE flow which doesn't exist followed by adding both the flows. Though DataTreeChange notification is received in proper order, there is some problem in processing the flow mod events. |
| Comment by Faseela K [ 21/Dec/18 ] |
|
Somashekhar : Do you know how to fix this? Or you need any further information? |
| Comment by Faseela K [ 21/Dec/18 ] |
|
jluhrsen : Somashekhar has confirmed that this is something wrong in openflowplugin, so move the Jira to ofp? |
| Comment by Jamo Luhrsen [ 21/Dec/18 ] |
|
I'm not totally following what we want then. The link I gave in my comment is the |
| Comment by Jamo Luhrsen [ 21/Dec/18 ] |
|
will do |
| Comment by Somashekhar Javalagi [ 21/Jan/19 ] |
|
jluhrsen, Can you please enable below log level and reproduce the same? log:set TRACE org.opendaylight.openflowplugin.applications.frm.nodeconfigurator.NodeConfiguratorImpl To see the queuing order of flow-mod requests? |
| Comment by Jamo Luhrsen [ 21/Jan/19 ] |
|
Somashekhar, I wont have time to look at this right now. But, if someone can make |
| Comment by Somashekhar Javalagi [ 06/Feb/19 ] |
|
I am keep on trying, but not yet able to reproduce. I will update my analysis for added logs onc I am able to reproduce. |
| Comment by Somashekhar Javalagi [ 19/Mar/19 ] |
|
Created a sandbox and scheduled it to run every hour for a week, but could not see any failures and hence was not able to reproduce. Can be reopened if it is seen again. |