[OPNFLWPLUG-875] Switch scalability regression due to missing table miss flows Created: 29/Mar/17 Updated: 27/Sep/21 Resolved: 16/Oct/17 |
|
| Status: | Resolved |
| Project: | OpenFlowPlugin |
| Component/s: | General |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Luis Gomez | Assignee: | Luis Gomez |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| External issue ID: | 8103 |
| Description |
|
This is detected here for both carbon and boron: Basically the scalability test fails because missing table miss flows required for topology discovery. |
| Comments |
| Comment by Tomas Slusny [ 30/Mar/17 ] |
|
Isn't this duplicate of 7770? |
| Comment by Luis Gomez [ 30/Mar/17 ] |
|
Very possible, we can fix 7770 and then recheck this one. |
| Comment by Luis Gomez [ 30/Mar/17 ] |
|
Note also that 7770 is cluster releated while this one happens with single instance. |
| Comment by D Arunprakash [ 28/Apr/17 ] |
|
Hi Luis, Now the scalability TCs are passing in both boron and carbon. Regards, |
| Comment by Luis Gomez [ 28/Apr/17 ] |
|
Yes they are passing but see the regression from 500 to 200: The regresion happend long back by the beginning of the year. BTW If I remove the topology test, the test goes back to 500 switches because issue is related to missing table miss flows. BR/Luis |
| Comment by D Arunprakash [ 03/May/17 ] |
|
Thanks Luis. Since its new to me, could you please help me in the steps to disable the topology tests and run the regression. Regards, |
| Comment by Luis Gomez [ 09/May/17 ] |
|
To reproduce the issue you can just generate multiple iterations of mininet linear topology with 100, 200, 300 nodes, etc... After a while, you will observe not all links are properly discovered. Nodes are good though. |
| Comment by Luis Gomez [ 10/May/17 ] |
|
I always thought the issue was in table miss application but actually the issue could be more this ERROR present in all the karaf logs [1] for this test: 2017-05-10 07:13:04,862 | ERROR | pool-27-thread-1 | OutboundQueueProviderImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.0.Carbon | No queue present, failing request |
| Comment by Luis Gomez [ 10/May/17 ] |
|
Note this issue can be connected to: https://bugs.opendaylight.org/show_bug.cgi?id=8401 |
| Comment by Luis Gomez [ 10/May/17 ] |
|
Note that karaf log for scale/perf test is set to ERROR only. I can also enable full debug if required. |
| Comment by D Arunprakash [ 11/May/17 ] |
|
Luis, https://bugs.opendaylight.org/show_bug.cgi?id=8401 Regards, |
| Comment by Luis Gomez [ 14/May/17 ] |
|
Log is too big to attach, you can grab from here: After looking at it, I do not see it is the same as BR/Luis |
| Comment by D Arunprakash [ 15/May/17 ] |
|
Flowing exception seen several times in the karaf log 2017-05-14 18:54:15,488 | WARN | entLoopGroup-7-5 | OFFrameDecoder | 182 - org.opendaylight.openflowjava.openflow-protocol-impl - 0.9.0.SNAPSHOT | Unexpected exception from downstream. 2017-05-14 19:06:26,955 | WARN | entLoopGroup-7-2 | OFFrameDecoder | 182 - org.opendaylight.openflowjava.openflow-protocol-impl - 0.9.0.SNAPSHOT | Unexpected exception from downstream. I'm seeing the below error as well... 2017-05-14 19:06:24,160 | ERROR | pool-31-thread-1 | OutboundQueueProviderImpl | 193 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | No queue present, failing request |
| Comment by Luis Gomez [ 16/May/17 ] |
|
OK, I just reproduced the issue in my laptop using 300 switches. I can see all switches come good but some do not have the table miss flow. There is nothing relevant in karaf.log when this happens. Since table miss flow is a test application, I will modify the robot test to skip this app and use NB pushed table miss flows instead. |
| Comment by Anil Vishnoi [ 16/May/17 ] |
|
I believe following patch should fix the "No queue present" issue reported in above comment. |
| Comment by Luis Gomez [ 16/May/17 ] |
|
Good point, let me try with latest carbon to see if I see any better. |
| Comment by Luis Gomez [ 16/May/17 ] |
|
And it works in carbon, I do not see table miss flows missing with latest code Anil, is it possible to cherry-pick the change to boron? |
| Comment by Luis Gomez [ 16/May/17 ] |
|
I was too fast to say it works, it seems it is better now, but still same logging: |
| Comment by Luis Gomez [ 20/May/17 ] |
|
So after reproducing this locally, I can see the main issue is still missing table miss flow after ~200 switches. Nothing in the logs give any hint of what the problem is. |
| Comment by Luis Gomez [ 22/May/17 ] |
|
To reproduce, just generate linear,300 or 400 topology and check flows in mininet (dpctl dump-aggregate -O OpenFlow13), there should be 1 table miss flow per switch, instead some switch miss the flow and therefore topology verification fails in this test. |
| Comment by Luis Gomez [ 23/May/17 ] |
|
So the real issue seems to be flow installation after switches connect (i.e. flow reconciliation). No matter if I pre-program flows in DS or use table miss flow feature, I always miss flows when many switches connect in short time. The only workaround is to install the table miss flows after all switches are fully connected to ODL. I created this patch to prove so: |
| Comment by Sunil Kumar M S [ 15/Jun/17 ] |
|
Hello Luis, I tried to reproduce the issue with steps provided by you, i tried gradually increasing the number of switches till 250 before my laptop ran out of resource at 300. Please try with the patch and let us know on your findings. Thanks. |
| Comment by Tomas Slusny [ 26/Jun/17 ] |
|
So that patch mentioned by Sunil was merged on both carbon and nitrogen and it should fix also this issue, so can you confirm this Luis? |
| Comment by Luis Gomez [ 03/Jul/17 ] |
|
When I test in my setup, 300 switches works good but when I go to 400, I still see table miss flows missing. |
| Comment by Luis Gomez [ 25/Jul/17 ] |
|
Also this test shows table miss flows are not even stable: Luis |
| Comment by Abhijit Kumbhare [ 18/Sep/17 ] |
|
Not a blocker for Nitrogen. However can be checked after https://bugs.opendaylight.org/show_bug.cgi?id=9089 is merged. |
| Comment by Tomas Slusny [ 28/Sep/17 ] |
|
So |
| Comment by Luis Gomez [ 16/Oct/17 ] |
|
OK, I just tested in my laptop and this seems to be fixed now. |