[TSDR-8] tsdr:list FlowTablStats command hungs the karaf console on HBase DataStore Created: 02/Jun/15  Updated: 19/Jun/15  Resolved: 19/Jun/15

Status: Verified
Project: tsdr
Component/s: General
Affects Version/s: unspecified
Fix Version/s: None

Type: Bug
Reporter: Vasanthan Balasubramaniyan Assignee: YuLing Chen
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File hs_err_pid21378.log    
External issue ID: 3542
Priority: High

 Description   

Environment:
Build#2135 (Integration Build)
Topology#Tree
sudo mn --topo tree,5 --switch ovsk,protocols=OpenFlow13 --controller remote,ip=10.16.148.232

Problem:
On HBase datastore,"tsdr:list FlowTableStats" doesn't returns the output on karaf console.Its hungs karaf console.

Step to Reproduce:
1.Install odl-tsdr-hbase on ODL controller
2.Once HBase collector started,start the tree topology (as given in environment) on mininet VM [Note: Mininet VM running on remotely]
3.Wait for 1~1.30 hours collection.
4.Issue "tsdr:list FlowTableStats" on karaf console doesn't return output/thrown any error.Waited for almost 16 minutes.

However "tsdr:list Portstats" returns output on 75 seconds.
"tsdr:list FlowStats" returns output on 30 seconds

TSDR collection doesn't broken and also updated in HBase DB.

Note:
===
During problematic time HBase row counts are,
FlowTableMetrics : 2255901 rows
InterfaceMetrics: 127320 rows
FlowMetrics: 27716 rows



 Comments   
Comment by Hariharan Sethuraman [ 16/Jun/15 ]

Reduced the maximum result size from default (java.lang.Long.MAX_VALUE) to 1000 (Bytes). Unit tested against 4.5 lakhs entries which took 7-8 seconds to respond. This response time is consistent irrespective of entries count.

Comment by Vasanthan Balasubramaniyan [ 16/Jun/15 ]

Attachment hs_err_pid21378.log has been added with description: crash hs_err_pid log

Comment by Vasanthan Balasubramaniyan [ 16/Jun/15 ]

On RC1 build: distribution-karaf-0.3.0-Lithium-RC1-v201506160017

Still this issue reproduciable
1.Sometimes generated outofMemory
2.Sometime crashed the ODL Controller.

Comment by YuLing Chen [ 17/Jun/15 ]

The root cause was that 1) when inserting into hbase, for each row we requested a connection to hbase and 2) we didn't return the connection to the connection pool after the thread is done. This caused a huge number of insertion threads in memory that led to OOM issue.

Fixes: 1) get one connection and insert the list of rows into hbase, 2) return the connection to the pool so that there would not be so many insertion threads in memory that caused the OOM issue.

Comment by Vasanthan Balasubramaniyan [ 19/Jun/15 ]

Environment:
Build#RC1 (distribution-karaf-0.3.0-Lithium-RC1-v201506180115.zip)
Topology: Tree
sudo mn --controller=remote,ip=10.16.148.232 --topo=tree,6 --switch ovsk,protocols=OpenFlow13
DataStore: odl-tsdr-hbase

Verification details
====================
Tested with 64 ovsk switches with collection of ~18 hours.
Karaf console command respond within 5 seconds (after 18 hrs)

After ~18 hours datastore records (in hbase),

PortStats : 4015288 row(s)
FlowTableStats: 64304667 row(s)
FlowStats: 841882 row(s)

Generated at Wed Feb 07 20:46:07 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.