[OPNFLWPLUG-350] Flow present in configuration data store but not operational when running GBP DevStack integration or Docker integration. Created: 30/Jan/15  Updated: 27/Sep/21  Resolved: 18/Jun/15

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Thomas Bachman Assignee: Michal Rehak
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Mac OS
Platform: All


Attachments: File datastores_and_flows.tar.gz     File inventory.tar.gz     File karaf-logs.tar.gz     File karaflogs.tar.gz     File port-delete-marked.pcapng     File port-delete-port-add.tar.gz    
Issue Links:
Blocks
blocks GBP-19 Meta-bug for tracking dependent bugs ... Resolved
blocks GBP-70 Missing flow-mod when running with de... Resolved
External issue ID: 2657
Priority: Low

 Description   

Refer to this email:

https://lists.opendaylight.org/pipermail/openflowplugin-dev/2015-January/002464.html



 Comments   
Comment by Thomas Bachman [ 30/Jan/15 ]

I have a way to reproduce this problem, somewhat reliably. Follow the directions here:

https://wiki.opendaylight.org/view/Group_Policy:Building_and_Running_Docker

You will only need a single VM to test this – you can just create the VM, configure the test scripts (i.e. IP address of the controller and append --policy in start-poc.sh, IP address of the tunnel in infrastructure_config.py), and run ./start-poc.sh (as sudo or root). When this comes up, run the following command:

ovs-ofctl dump-flows s1 | wc -l

You should see the number 54 (there are 53 flows installed, but the number is 54 b/c wc counts a header message that is dumped with this command). You should be able to recreate the failure by then running the following commands:

./docker-clean.sh
./start-poc.sh

You'll know you've hit the failure case when the output of this:

ovs-ofctl dump-flows s1 | wc -l

shows 53 instead of 54. You should then observe 53 flows in the config data store, but only 52 flows in the operational data store.

Comment by Keith Burns [ 30/Jan/15 ]

We believe the root cause of this, is that when a port is deleted via "ovs-vsctl", a packet is generated by OVS (marked in the trace), but the operational datastore does not remove this "port" (node-connector).

We rely on the datastore being correct in order to know where we instantiate flows.

Comment by Keith Burns [ 30/Jan/15 ]

Attachment port-delete-marked.pcapng has been added with description: Wireshark Trace of DeletePort

Comment by Thomas Bachman [ 03/Feb/15 ]

The missing flow seems to appear in table 2. Below is a dump of the flows that are present in table 2:

root@odlgbp1:~/work/groupbasedpolicy/util/dockerTestOfOverlay# ovs-ofctl dump-flows s1 table=2
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=236.434s, table=2, n_packets=0, n_bytes=0, idle_age=236, priority=1 actions=drop
cookie=0x0, duration=236.434s, table=2, n_packets=0, n_bytes=0, idle_age=236, priority=140,reg5=0x13,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31]
cookie=0x0, duration=236.434s, table=2, n_packets=0, n_bytes=0, idle_age=236, priority=140,reg5=0x12,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31]
cookie=0x0, duration=236.434s, table=2, n_packets=0, n_bytes=0, idle_age=236, priority=150,arp,reg6=0x5,arp_tpa=10.0.36.1,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],mod_dl_src:88:f0:31:b5:12:b5,load:0x2>NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],load:0x88f031b512b5>NXM_NX_ARP_SHA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xa002401>NXM_OF_ARP_SPA[],IN_PORT
cookie=0x0, duration=236.434s, table=2, n_packets=0, n_bytes=0, idle_age=236, priority=132,ip,reg6=0x5,dl_dst=88:f0:31:b5:12:b5,nw_dst=10.0.35.3 actions=load:0x7->NXM_NX_REG2[],load:0x6->NXM_NX_REG3[],load:0x2->NXM_NX_REG7[],mod_dl_src:88:f0:31:b5:12:b5,mod_dl_dst:00:00:00:00:35:03,dec_ttl(0)
cookie=0x0, duration=236.434s, table=2, n_packets=0, n_bytes=0, idle_age=236, priority=132,ip,reg6=0x5,dl_dst=88:f0:31:b5:12:b5,nw_dst=10.0.36.3 actions=load:0x3->NXM_NX_REG2[],load:0x6->NXM_NX_REG3[],load:0x4->NXM_NX_REG7[],mod_dl_src:88:f0:31:b5:12:b5,mod_dl_dst:00:00:00:00:36:03,dec_ttl(0)
cookie=0x0, duration=236.434s, table=2, n_packets=0, n_bytes=0, idle_age=236, priority=132,ip,reg6=0x5,dl_dst=88:f0:31:b5:12:b5,nw_dst=10.0.36.2 actions=load:0x3->NXM_NX_REG2[],load:0x6->NXM_NX_REG3[],load:0x3->NXM_NX_REG7[],mod_dl_src:88:f0:31:b5:12:b5,mod_dl_dst:00:00:00:00:36:02,dec_ttl(0)
cookie=0x0, duration=236.434s, table=2, n_packets=0, n_bytes=0, idle_age=236, priority=132,ip,reg6=0x5,dl_dst=88:f0:31:b5:12:b5,nw_dst=10.0.35.2 actions=load:0x7->NXM_NX_REG2[],load:0x6->NXM_NX_REG3[],load:0x1->NXM_NX_REG7[],mod_dl_src:88:f0:31:b5:12:b5,mod_dl_dst:00:00:00:00:35:02,dec_ttl(0)
cookie=0x0, duration=236.434s, table=2, n_packets=0, n_bytes=0, idle_age=236, priority=50,reg4=0x4,dl_dst=00:00:00:00:35:03 actions=load:0x7->NXM_NX_REG2[],load:0x6->NXM_NX_REG3[],load:0x2->NXM_NX_REG7[]
cookie=0x0, duration=236.434s, table=2, n_packets=0, n_bytes=0, idle_age=236, priority=50,reg4=0x4,dl_dst=00:00:00:00:35:02 actions=load:0x7->NXM_NX_REG2[],load:0x6->NXM_NX_REG3[],load:0x1->NXM_NX_REG7[]
cookie=0x0, duration=236.434s, table=2, n_packets=0, n_bytes=0, idle_age=236, priority=50,reg4=0x4,dl_dst=00:00:00:00:36:03 actions=load:0x3->NXM_NX_REG2[],load:0x6->NXM_NX_REG3[],load:0x4->NXM_NX_REG7[]
cookie=0x0, duration=236.434s, table=2, n_packets=0, n_bytes=0, idle_age=236, priority=50,reg4=0x4,dl_dst=00:00:00:00:36:02 actions=load:0x3->NXM_NX_REG2[],load:0x6->NXM_NX_REG3[],load:0x3->NXM_NX_REG7[]

Here is the flow that's missing from table 2:
cookie=0x0, duration=8.721s, table=2, n_packets=0, n_bytes=0, idle_age=8, priority=150,arp,reg6=0x5,arp_tpa=10.0.35.1,arp_op=1 actions=move:NXM_OF_ETH_SRC[]>NXM_OF_ETH_DST[],mod_dl_src:88:f0:31:b5:12:b5,load:0x2>NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]>NXM_NX_ARP_THA[],load:0x88f031b512b5>NXM_NX_ARP_SHA[],move:NXM_OF_ARP_SPA[]>NXM_OF_ARP_TPA[],load:0xa002301>NXM_OF_ARP_SPA[],IN_PORT

Comment by Keith Burns [ 03/Feb/15 ]

Please look for any flow in CONFIG AFTER with "35:02" (mac address of an endpoint). You can see this flow in CONFIG-AFTER but not in OPERATIONAL-AFTER.

Adding karaf logs in another file.

Comment by Keith Burns [ 03/Feb/15 ]

Attachment datastores_and_flows.tar.gz has been added with description: Config and Operational datastores before and after along with flow table

Comment by Keith Burns [ 03/Feb/15 ]

Attachment karaflogs.tar.gz has been added with description: Karaf logs

Comment by Keith Burns [ 03/Feb/15 ]

In the config and operational data stores zip it would be helpful for you to start with :

  • nodes-operational-before.txt, nodes-config-before.txt
  • flows-before-after.txt: This file not only shows the flows, but shows the steps to reproduce this (ie delete/add port via OVS-VSCTL)
  • nodes-operational-after.txt, nodes-config-after.txt: shows that flows are written for "35:02" in config but not in oper
Comment by Keith Burns [ 03/Feb/15 ]

It should also be noted that if you do a:

  • ovs-vsctl del-controller
  • ovs-vsctl set-controller etc
    Everything works perfectly (ie the "stuck" flows are then unstuck)
Comment by Thomas Bachman [ 04/Feb/15 ]

This is the output of the REST calls to inventory – both config and operational – when the flow is missing. There is also a set of both config and operational for just the table with the missing flow (table 2).

Comment by Thomas Bachman [ 04/Feb/15 ]

Attachment inventory.tar.gz has been added with description: Another capture of full oper/config inventory when this happens, and just the table 2 oper/config when this happens (i.e. the table with the missing flow)

Comment by Thomas Bachman [ 04/Feb/15 ]

This has the karaf logs for that run. Note that I had to create and destroy the ports multiple times to make this happen, so you'll see them come and go a few times. The last run has the failure.

Comment by Thomas Bachman [ 04/Feb/15 ]

Attachment karaf-logs.tar.gz has been added with description: Here are the karaf log files for that run

Comment by Keith Burns [ 04/Feb/15 ]

After working with Michal Rehak we determined that for some reason flows are making it to CONFIG but not making it to OPER (ran LOG.info in frm.add() as well).

Comment by Keith Burns [ 04/Feb/15 ]

Attachment port-delete-port-add.tar.gz has been added with description: Wireshark trace of port-add-delete and flowmods

Comment by Abhijit Kumbhare [ 12/Mar/15 ]

Any update on this Michal?

Comment by Thomas Bachman [ 14/Apr/15 ]

FYI – we found another way to make this happen, very reliably. Using master, run the POC, and do the following:

PUT http://controllerIp:port/restconf/config/policy:tenants/policy:tenant/f5c7d344-d1c7-4208-8531-2c2693657e12/l3-context/c0b0edd4-c391-4816-bd07-bf5bb73e3334

body:
{
"l3-context": [

{ "id": "c0b0edd4-c391-4816-bd07-bf5bb73e3334" }

]
}

followed by:

http://controllerIp:port/restconf/config/policy:tenants/policy:tenant/f5c7d344-d1c7-4208-8531-2c2693657e12/l2-bridge-domain/7b796915-adf4-4356-b5ca-de005ac410c1
body:
{
"l2-bridge-domain": [

{ "id": "7b796915-adf4-4356-b5ca-de005ac410c1", "parent": "c0b0edd4-c391-4816-bd07-bf5bb73e3334" }

]
}

Comment by Abhijit Kumbhare [ 05/Jun/15 ]

Lowering the priorityto normal and milestone Lithium last RC (to fix only if time permitting) - as GBP has a workaround as per Thomas Bachman.

Comment by Abhijit Kumbhare [ 12/Jun/15 ]

Just saw that this has been marked as blocking 3462 (which itself is marked as a blocker). Kamal - is this something you can look into with Michal's or Anil's help (Anil is back from the vacation on Monday). Of course - Michal if you can work on this it will be great.

Comment by Keith Burns [ 12/Jun/15 ]

Yep, may as well get this over with. This has gone on far too long. We see flows in CONF, they aren't in OPER. We have to reset controller to get them into OVS.

We have tried everything from modifying our matchers, setting up a standard FlowId pattern that automagically generates a FlowId based off the Match passed to it, and we still see this.

Comment by Luis Gomez [ 15/Jun/15 ]

So looking at the flows that get alien id (weird flow id) in operational I see 2 issues:

1) What Anil commented this morning that current config<->operational matching function does not look at extensions matches. The workaround for this is to use flow coookies.

2) It seems the mac address mask is not reported in operational and that could also impact the flow matching function:

config:
"ethernet-match": {
"ethernet-destination":

{ "address": "01:00:00:00:00:00", "mask": "01:00:00:00:00:00" }

}

operational:

"ethernet-match": {
"ethernet-destination":

{ "address": "01:00:00:00:00:00" }

}

Comment by Keith Burns [ 18/Jun/15 ]

Was unable to reproduce in lab environment with latest GBP stable/lithium. Assuming that becomes part of the RC tonight, I'm ok after that with closing this.

Already updated blocking bug list.

Generated at Wed Feb 07 20:32:14 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.