[OPNFLWPLUG-504] flows statistics unstable when 80k flow configured Created: 16/Jun/15  Updated: 27/Sep/21  Resolved: 18/Dec/15

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Peter Gubka Assignee: Michal Rehak
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Linux
Platform: PC


Attachments: Text File karaf.log    
Issue Links:
Duplicate
is duplicated by OPNFLWPLUG-494 flow collection did not collect all f... Resolved
External issue ID: 3762
Priority: High

 Description   

used odl: distribution-karaf-0.3.0-Lithium-RC1-v201506160017.tar.gz

installed feature: odl-openflowplugin-flowservices-ui-li

when installed 80k flows, they were not collected, but after mininet restart all flow were collected well

then after some time 4 switches were lost from inventory

(1434487760.13342, 63, 74567, 74120)
(1434487792.004231, 63, 79567, 79000)
(1434487821.56991, 63, 79567, 79000) HERE ONLY 79k of 80k COLLECTED
(1434487850.850809, 63, 79567, 79000)
(1434487881.234324, 63, 79567, 79000)
Could not retrieve inventory, response not in JSON format
(1434487896.83477, 0, 0, 0)
Could not retrieve inventory, response not in JSON format MININET RESTART
(1434487926.857697, 0, 0, 0)
Could not retrieve inventory, response not in JSON format
(1434487956.889432, 0, 0, 0)
(1434487991.114253, 30, 34490, 32880)
(1434488028.17617, 56, 68324, 67820) 7 SWITCHES LOST FROM INVENTORY
(1434488059.341353, 63, 80567, 80000)
(1434488089.260265, 63, 80567, 80000)
...
(1434488384.822826, 63, 80567, 80000)
(1434488419.865389, 59, 75451, 74920)
(1434488450.665859, 63, 80567, 80000)
(1434488475.408658, 63, 80567, 80000)
(1434488510.034752, 63, 80567, 80000)

this can be just another variant of the same problem, which was reported here
https://bugs.opendaylight.org/show_bug.cgi?id=3735, but stats collectio in Lithium RC1_June15 looked more stable than in this LIthium_RC1_June16 build



 Comments   
Comment by Peter Gubka [ 16/Jun/15 ]

Attachment karaf.log has been added with description: karaf log

Comment by Kavitha Ramalingam [ 06/Aug/15 ]

Hi Peter
We tried to run the python script “flow_stats_stability_monitor.py” having 1 switch and installing 80K flows. We were able to successfully install the flows. I have few queries reg Bug ID : 3762

• Is this problem always re-creatable with 63 switches. We hit memory issues on our VMs, so couldn’t try with 63 switches.
• You had mentioned that after restarting mininet, flows were not collected. Is this issue seen only during first time?
• (1434487760.13342, 63, 74567, 74120) – Could you pl. provide background about what these fields mean, for our better understanding.
• From the o/p (1434487881.234324, 63, 79567, 79000), my understanding is that the third argument holds the statistics of number of flows added and the last one reflects the actual flows added. With this i only infer that only 79K flows are added and the statistics reflects additional 567 flows. When the mininet is restarted, (1434488510.034752, 63, 80567, 80000) 80 K flows have been added. However, statistics shows additional 567. So, i believe that there are 2 issues - (1) 1K flows not being added (2) Statistics shows additional 567 flows. Let me know if my understanding is right.

Thanks and Regards
Kavitha

Comment by Kavitha Ramalingam [ 06/Aug/15 ]

Also, i would like to know about how OPNFLWPLUG-505 is different from this bug.

Thanks and regards
Kavitha

Comment by Kavitha Ramalingam [ 06/Aug/15 ]

I found that OPNFLWPLUG-494 looks similar to this bug.

Comment by Kavitha Ramalingam [ 12/Aug/15 ]

I used latest image for my testing. I was able to successfully add 80K flows using the script flow_stats_stability_monitor.py and statistics were in tact. I'll wait for Peter to get back to understand the exact problem recreation steps.

Excerpt from the o/p...

      • Total flows added: 80000
        HTTP[OK] results: 4000

(1439379949.190588, 63, 76640, 76000)
(1439379978.095428, 63, 80000, 80000)
(1439380014.07273, 63, 80000, 80000)
(1439380049.057831, 63, 80000, 80000)
(1439380073.981449, 63, 80000, 80000)
(1439380099.447188, 63, 80000, 80000)
(1439380130.397766, 63, 80000, 80000)
(1439380166.070169, 63, 80000, 80000)
(1439380188.765338, 63, 80000, 80000)
(1439380222.663056, 63, 80000, 80000)
(1439380254.579992, 63, 80000, 80000)
(1439380278.786276, 63, 80000, 80000)
(1439380312.380398, 63, 80000, 80000)
(1439380345.184222, 63, 80000, 80000)
(1439380370.950659, 63, 80000, 80000)
(1439380406.442469, 63, 80000, 80000)
(1439380430.161879, 63, 80000, 80000)
(1439380466.098296, 63, 80000, 80000)
(1439380491.369774, 63, 80000, 80000)
(1439380523.377546, 63, 80000, 80000)
(1439380552.993085, 63, 80000, 80000)
(1439380585.513106, 63, 80000, 80000)
(1439380608.89825, 63, 80000, 80000)
(1439380642.429276, 63, 80000, 80000)
(1439380676.319508, 63, 80000, 80000)
(1439380699.485387, 63, 80000, 80000)

Comment by Peter Gubka [ 17/Aug/15 ]

The tupple (1434487760.13342, 63, 74567, 74120) contains
(timestamp,
nr of switches present in operational inventory,
flows present in table statistics of operationa invenroy datastore,
nr of flow details present in operational inventory)

Since this bug was reported there are jobs like
https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-scale-stats-collection-daily-lithium-redesign-only-stable-lithium/

which automate the same workflow as it was done manually before. They use 100k flows and 63 switches. They are robotized. In the job builds #102 and #103 you can see that 100k was not collected with failure(build 103:
Keyword 'Check Flows Inventory' failed after retrying for 7 minutes 30 seconds. The last error was: 100000.0 != 98421.0

Also you can see that flow stats collection does not fail all the time. That it why i say "unstable" stats collection. All the builds which have all 11 tests passed were ok.

Comment by Kavitha Ramalingam [ 20/Aug/15 ]

I tried running the script more than 10 times. However, i was not able to hit the problem. Looks like its a timing issue. I'll go thro' the code and see if there are any possibility for loophole.

Peter, meanwhile can you pl. let me know if i can get access to your test set-up where i could recreate this problem.

Comment by Abhijit Kumbhare [ 25/Sep/15 ]

Any thoughts Peter about reproducing this?
Also added Anil to the cc. Any thoughts Anil?

Comment by Peter Gubka [ 28/Sep/15 ]

I dont know how to reproduce this "on demand". I have been thinking about this since i reported this behavior. But having a look at yellow dots at https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-scale-stats-collection-daily-lithium-redesign-only-stable-lithium/ we see that it is happening often.

Comment by Anil Vishnoi [ 03/Oct/15 ]

Abhijit, this is related to statistics manager in alternate design, so i think michal would be the best person to comment on this.

Anil

Comment by Abhijit Kumbhare [ 09/Oct/15 ]

Michal,

Can you comment?

Thanks,
Abhijit

Comment by Kamal Rameshan [ 09/Oct/15 ]

Hi Peter ,

Are you running in a single mininet all the switches?

Sometimes i have seen disconnects and flows not getting properly installed in a single mininet vm.

Can you retry by distributing the switches amongst more than 1 vm?

Comment by Peter Gubka [ 12/Oct/15 ]

(In reply to Kamal Rameshan from comment #11)
> Hi Peter ,
>
> Are you running in a single mininet all the switches?
>
> Sometimes i have seen disconnects and flows not getting properly installed
> in a single mininet vm.
>
> Can you retry by distributing the switches amongst more than 1 vm?

Hello.
At the moment all these tests run on single VM in integration jenkins.
Please note, that this issue it is not reproducible on demand and preparing local env with switches distributed over more VMs will take time too. And as i have tasks with higher priorities these days, not really sure when i can get back to this issue.
Peter

Comment by Abhijit Kumbhare [ 30/Oct/15 ]

Made to "normal" as stalled due to no easy way to reproduce the bug.

Comment by Peter Gubka [ 18/Dec/15 ]

This bug is covered by the suite
https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-scale-stats-collection-daily-lithium-redesign-only-beryllium/
and
https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-scale-stats-collection-daily-lithium-redesign-only-stable-lithium/

The particular test is "Stable State Monitoring". This test has failed in the past only when the previous test failed, because 100k flows were not present in the ds/operational.

This bug would be valid only when this particular test fail as the only one in this suite. And this has not happened for several months.

This bug was probably fixed as a side effect of stats manager improvement in the past. So closing ....

Generated at Wed Feb 07 20:32:39 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.