[OVSDB-469] Very high String allocation pressure on GC due to org.opendaylight.ovsdb.lib.notation.Version.fromString() Created: 20/Sep/18  Updated: 12/Nov/18  Resolved: 02/Oct/18

Status: Resolved
Project: ovsdb
Component/s: None
Affects Version/s: None
Fix Version/s: Oxygen-SR3, Fluorine

Type: Bug Priority: Highest
Reporter: Michael Vorburger Assignee: Michael Vorburger
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates
relates to CONTROLLER-1864 A 8s "non-GC world stop JVM pause" du... Resolved
Priority: Highest

 Description   

I was suprised to see the stack trace below as the top #3 memory allocator in the JFR from CONTROLLER-1864 showing up as 1 GB Total Allocation of String, that's a hell of a lot of objects for what org.opendaylight.ovsdb.lib.notation.Version.fromString() does!

thapar or skitt or any other OVSDB maintainers, how about rewriting that simpler and without allocation?

String java.lang.String.substring(int, int)	2558
CharSequence java.lang.String.subSequence(int, int)	1653
CharSequence java.util.regex.Matcher.getSubSequence(int, int)	1610
String java.util.regex.Matcher.group(int)	1610
Version org.opendaylight.ovsdb.lib.notation.Version.fromString(String)	1567
Version org.opendaylight.ovsdb.lib.schema.typed.TyperUtils.getColumnUntilVersion(Method)	857
void org.opendaylight.ovsdb.lib.schema.typed.TyperUtils.checkColumnSchemaVersion(DatabaseSchema, Method)	857
void org.opendaylight.ovsdb.lib.schema.typed.TyperUtils.access$100(DatabaseSchema, Method)	857
Object org.opendaylight.ovsdb.lib.schema.typed.TyperUtils$1.processGetColumn(Method)	768
Object org.opendaylight.ovsdb.lib.schema.typed.TyperUtils$1.invoke(Object, Method, Object[])	768
Column com.sun.proxy.$Proxy463.getLocatorColumn()	243
RemoteUcastMacs org.opendaylight.ovsdb.hwvtepsouthbound.transactions.md.HwvtepUcastMacsRemoteUpdateCommand.buildRemoteUcast(UcastMacsRemote)	243
Node org.opendaylight.ovsdb.hwvtepsouthbound.transactions.md.HwvtepUcastMacsRemoteUpdateCommand.buildConnectionNode(Collection)	243
void org.opendaylight.ovsdb.hwvtepsouthbound.transactions.md.HwvtepUcastMacsRemoteUpdateCommand.updateUcastMacsRemote(ReadWriteTransaction, Collection)	243
void org.opendaylight.ovsdb.hwvtepsouthbound.transactions.md.HwvtepUcastMacsRemoteUpdateCommand.execute(ReadWriteTransaction)	243
void org.opendaylight.ovsdb.hwvtepsouthbound.transactions.md.HwvtepOperationalCommandAggregator.execute(ReadWriteTransaction)	243
void org.opendaylight.ovsdb.hwvtepsouthbound.transactions.md.TransactionInvokerImpl.run()	243
void java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)	243
void java.util.concurrent.ThreadPoolExecutor$Worker.run()	243
void java.lang.Thread.run()	243


 Comments   
Comment by Michael Vorburger [ 20/Sep/18 ]

and the top #6 overall memory allocation, with 538 MB of int[] is this, which will also disappear together with above:

void java.util.regex.Matcher.<init>(Pattern, CharSequence)	4943
Matcher java.util.regex.Pattern.matcher(CharSequence)	4938
Version org.opendaylight.ovsdb.lib.notation.Version.fromString(String)	2960
Version org.opendaylight.ovsdb.lib.schema.typed.TyperUtils.getColumnUntilVersion(Method)	1428
void org.opendaylight.ovsdb.lib.schema.typed.TyperUtils.checkColumnSchemaVersion(DatabaseSchema, Method)	1428
void org.opendaylight.ovsdb.lib.schema.typed.TyperUtils.access$100(DatabaseSchema, Method)	1428
Object org.opendaylight.ovsdb.lib.schema.typed.TyperUtils$1.processGetColumn(Method)	1009
Object org.opendaylight.ovsdb.lib.schema.typed.TyperUtils$1.invoke(Object, Method, Object[])	1009
Column com.sun.proxy.$Proxy463.getLogicalSwitchColumn()	385
RemoteUcastMacs org.opendaylight.ovsdb.hwvtepsouthbound.transactions.md.HwvtepUcastMacsRemoteUpdateCommand.buildRemoteUcast(UcastMacsRemote)	385
Node org.opendaylight.ovsdb.hwvtepsouthbound.transactions.md.HwvtepUcastMacsRemoteUpdateCommand.buildConnectionNode(Collection)	385
void org.opendaylight.ovsdb.hwvtepsouthbound.transactions.md.HwvtepUcastMacsRemoteUpdateCommand.updateUcastMacsRemote(ReadWriteTransaction, Collection)	385
void org.opendaylight.ovsdb.hwvtepsouthbound.transactions.md.HwvtepUcastMacsRemoteUpdateCommand.execute(ReadWriteTransaction)	385
void org.opendaylight.ovsdb.hwvtepsouthbound.transactions.md.HwvtepOperationalCommandAggregator.execute(ReadWriteTransaction)	385
void org.opendaylight.ovsdb.hwvtepsouthbound.transactions.md.TransactionInvokerImpl.run()	385
void java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)	385
void java.util.concurrent.ThreadPoolExecutor$Worker.run()	385
void java.lang.Thread.run()	385
Comment by Vishal Thapar [ 20/Sep/18 ]

I believe these are from JFRs shared by E/// team? This is from HWVTEP. This column lists all macs that are available to this particular port. The way schema is designed and based on deployment, this will be every other mac in network*N where N is number of TOR devices.

I am not sure if fromString is the culprit here, the string in ColumnVersion will be something like 1.7.12, it is literally the schema version string. It is the data in this column that can grow huge.

Are we sure it is the fromVersion that is doing huge allocation and not contents of column itself?

Comment by Michael Vorburger [ 20/Sep/18 ]

Yes, JFRs shared by E/// team (CONTROLLER-1864), and Yes the wait I interpret the stack trace shown it's the fromVersion that is doing huge allocation .. I'm not sure what exactly you mean by "contents of column itself", but the stack trace which I see in JMC (above) should look different if it wasn't about the allocation in fromVersion? At the worst, the change I just proposed (c/76290) cannot hurt, and should more that looks different in a future JFR.

Comment by Vishal Thapar [ 20/Sep/18 ]

Then something else is wrong. FromVersion can't and shouldn't be doing huge allocations coz version string will never be anything bigger than 'xxx.yyy.zzz'. How can a string like that cause huge allocations?

For every table and column in OVSDB we define a from,to versions [e.g. 1.0.0, 2.0.0] to determine if a given switch supports a given column or not. This caode is just to check if column is supported or not. This simple version comparison to see if column falls within range or not.

If issue were too many small allocations, I'd understand, but huge allocation is just not possible here. We're missing something significant.

Comment by Michael Vorburger [ 20/Sep/18 ]

Nope, as a wrote, the issue is 1 GB but not of a huge String (I never said that, you interpreted) of many small, substring() String; specifically x3 for each part of the Version. What is the harm in getting c/76290 merged and seeing? It cannot hurt, at worst you are right and it doesn't help, at best I'm right and this overallocation disappears. Worth a try!

Comment by Vishal Thapar [ 20/Sep/18 ]

Aha, makes sense. Sorry for confusion.

Trading allocation for performance hit which definitely will be there? There is a reason pattern was used in first place. I'd prefer running tests with a distribution zip off the patch, getting results before we merge.

Worst case is not that this doesn't help. Worst case is it makes everything too slow and scale numbers get even worse.

EDIT:
Am trying to come up with a patch that can significantly reduce the number of times this method needs to be called to the point where we won't end up with so huge allocation. I think I might be able to figure out a redesign that will give best results.

Comment by Michael Vorburger [ 20/Sep/18 ]

> performance hit which definitely will be there? There is a reason pattern was used in first place.

What? What makes you say that? The change proposed in c/76290 will, if anything, make Version fromString() faster, not slower... FYI that parse() method is directly inspired by JDK's Integer.parseInt.  skitt perhaps you would like to weight in here? 

> I'd prefer running tests with a distribution zip off the patch, getting results before we merge

sure, always a good idea, can you CSIT it? Shout if anything broken (due to this), doubt it.

Comment by Stephen Kitt [ 21/Sep/18 ]

I think Michael’s patch is definitely worth it, thanks. It’s a shame parseInt() can’t deal with substring parsing itself! NumberFormat can, but it’s too expensive.

Comment by Michael Vorburger [ 02/Oct/18 ]

I've back-ported (manual cherry-pick and conflict resolution) https://git.opendaylight.org/gerrit/#/c/76290/ from Neon master to Fluorine (https://git.opendaylight.org/gerrit/#/c/76568) and Oxygen (https://git.opendaylight.org/gerrit/#/c/76569).

Generated at Wed Feb 07 20:36:29 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.