[OPNFLWPLUG-442] redesign ofp inventory looses switches after few thousands of flows configured; often a problem to connect 25 switches Created: 18/May/15  Updated: 27/Sep/21  Resolved: 27/May/15

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Peter Gubka Assignee: Moiz Raja
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: File flow-1000.xml     Zip Archive karaf.log.zip    
Issue Links:
Blocks
blocks OPNFLWPLUG-429 Milestone: OpenFlow Plugin Redesign -... Resolved
blocks OPNFLWPLUG-443 new rpc call update-table stuck Resolved
blocks OVSDB-156 Clustering: Errors if controller star... Resolved
External issue ID: 3249
Priority: Highest

 Description   

This job https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-cds-config-performance-lithium-redesign-only-master runs a script which configures 10k flows on 25 switches, then tries to collect stats, removes flows and collect stats again.

Having a look into the results from May15 till now, we often have a problem to connect even 25 switches without any flows.

During the script run, we loose nodes from inventory
Output before adding flows:
Cycle 0:
Thread 3:
Adding 2000 flows on 25 nodes
Thread 4:
Adding 2000 flows on 25 nodes
Thread 1:
Adding 2000 flows on 25 nodes
Thread 0:
Adding 2000 flows on 25 nodes
Thread 2:
Adding 2000 flows on 25 nodes

Output before deleting flows:
Cycle 0:
Thread 2: Deleting 2000 flows on 4 nodes
Thread 3: Deleting 2000 flows on 4 nodes
Thread 0: Deleting 2000 flows on 4 nodes
Thread 1: Deleting 2000 flows on 4 nodes
Thread 4: Deleting 2000 flows on 4 nodes

Logs contains lots of WARN and ERROR messages (with stack traces) with different message content.

The first NOK behaviour is noticed in the job

https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-cds-config-performance-lithium-redesign-only-master/45/

which uses:
http://nexus01.dfw.opendaylight.org:8081/nexus/content/repositories/opendaylight.snapshot/org/opendaylight/integration/distribution-karaf/0.3.0-SNAPSHOT/distribution-karaf-0.3.0-20150515.195941-1579.zip

The last OK behaviour was in
https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-cds-config-performance-lithium-redesign-only-master/44/

with image
http://nexus01.dfw.opendaylight.org:8081/nexus/content/repositories/opendaylight.snapshot/org/opendaylight/integration/distribution-karaf/0.3.0-SNAPSHOT/distribution-karaf-0.3.0-20150515.192049-1576.zip



 Comments   
Comment by Peter Gubka [ 18/May/15 ]

Attachment karaf.log.zip has been added with description: karaf.log.zip

Comment by Jozef Gloncak [ 18/May/15 ]

On Friday 15 may 2015 I was able to successfully send 3000 flows to OF switch and via flow statistic reply store it to operational datastore.

Today I checkout back to my built (when it was possible) 39ca97e292ee5348cac005fe35b7973dfe8a1627 (Ensuring exclusive access to flow registry.). I built it again and currently I am getting lot of warning messages in log:

2015-05-18 14:03:55,393 | WARN | entLoopGroup-8-3 | TransactionProxy | 179 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Failed to acquire operation permit for transaction member-1-txn-81733

I suppose that problem could be in controller. I will try to build caraf with artifacts from friday.

Comment by Jozef Gloncak [ 18/May/15 ]

The same log entry is observed also with
odl-openflowplugin-flow-services-rest
feature installed.

Comment by Moiz Raja [ 18/May/15 ]

Could we please add steps to reproduce in here. I am assuming we want 25 switches connected and then the flow blaster script to be run.

The links to the jobs are not working.

Comment by Peter Gubka [ 19/May/15 ]

(In reply to Moiz Raja from comment #3)
> Could we please add steps to reproduce in here. I am assuming we want 25
> switches connected and then the flow blaster script to be run.
>
> The links to the jobs are not working.

Links are not working, because jenkins' history "moved" forward. In general you can take any later job results

https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-cds-config-performance-lithium-redesign-only-master/

Comment by Peter Gubka [ 19/May/15 ]

(In reply to Moiz Raja from comment #3)
> Could we please add steps to reproduce in here. I am assuming we want 25
> switches connected and then the flow blaster script to be run.
>
> The links to the jobs are not working.

Regarding steps to reproduce. You can take any build with "yellow" ball and check it's robot results. The test "Are Switches Connected" is failed, because 25 switches managed by mininet are not present in the operational inventory.

So steps are, connect 25 switches with mininet and check operational inventory nodes.

Comment by Jozef Gloncak [ 19/May/15 ]

steps to reproduce:

  • start openflowplugin karaf distribution
  • feature:install odl-openflowplugin-flow-services-rest-li
  • start mininet (with e.g. one switch)
  • PUT batch of flows (in my example 1000 flows see enclosure)

Then in karaf log starts to occure following warning repeatedly
2015-05-19 08:46:03,754 | WARN | entLoopGroup-8-8 | TransactionProxy | 182 - org.opendaylight.controller.sal-distributed-datastore - 1.2.0.SNAPSHOT | Failed to acquire operation permit for transaction member-1-txn-187409

Other symptom is problematic connection between controller and switch. Controller gradually stops to response. At the begining controller response on OFPT_HELLO but in the end no response from controller is received.

When inserting flows one by one (tested on three flows) then this problem doesn't occure.

Comment by Jozef Gloncak [ 19/May/15 ]

Attachment flow-1000.xml has been added with description: 1000 flows

Comment by Jozef Gloncak [ 19/May/15 ]

after merging this patch
https://git.opendaylight.org/gerrit/#/c/20680/

new problem has occured.

When config datastore contains batch of flow (in my L2SWITCH-40) they are successfully sent to device after it turn on. Also statistics bring this information to controller but it isn't written to database.

Karaf log contains flooding of warn logs
2015-05-19 13:05:12,395 | WARN | entLoopGroup-8-5 | TransactionChainManager | 198 - org.opendaylight.openflowplugin.impl - 0.1.0.SNAPSHOT | failed to put into writeOnlyTransaction : Previous transaction member-1-txn-407757 is not ready yet

Comment by Peter Gubka [ 27/May/15 ]

the initial problem is the problem of ovs versioning discussed in
https://lists.opendaylight.org/pipermail/integration-dev/2015-May/003282.html

the last problem mentioned by Jozef does not happen at this time

therefor "closing" the bug

Generated at Wed Feb 07 20:32:29 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.