[NETCONF-88] Memory leak when mounting/unmounting devices Created: 22/Oct/15  Updated: 15/Mar/19  Resolved: 13/Jan/16

Status: Resolved
Project: netconf
Component/s: netconf
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Jozef Behran Assignee: Robert Varga
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: File leakwatch.sh     File multimount.sh    
External issue ID: 4514

 Description   

When a device is mounted on NETCONF through RESTCONF, a bunch of memory is leaked. Initially it is about 0.9 MB but with each "mount" request the leak increases by about 8 KB. The "initial" leak caused by a first device mount depends on total mount operations executed and is not affected by device unmounts.

Additionally, device unmounts also leak memory but they don't add anything to the amount leaked by each mount. The amount of memory leaked by a device unmount depends on the count of devices currently mounted: When only one device is present and it is deleted, the leaked memory is roughly 300 KB. For each 100 devices mounted at the time of the deletion unmount, the leak increases by roughly 800 KB.

To reproduce the basic form of the leak use the attached script. It mounts and then unmounts a device and reports memory leaked so far and memory leaked by the last mount/unmount pair and whether any errors encountered. You can specify a repeat count on the commandline which defaults to 1.

This bug is present in both, Lithium and Beryllium.



 Comments   
Comment by Jozef Behran [ 22/Oct/15 ]

Attachment leakwatch.sh has been added with description: Tool to bombard ODL Netconf with device mount/unmount requests via Restconf

Comment by Jozef Behran [ 22/Oct/15 ]

Additional information is that the leak occurs even when the device is not actually there (connecting fails). No testtool is thus necessary and it does not matter what port is configured for the device.

Also this bug occurs with the default configuration of ODL. Just install odl-restconf and odl-netconf-connector-all.

Comment by Jozef Behran [ 22/Oct/15 ]

Use this utility to mount a bunch of devices and then unmount them. The default size of the bunch is 100, if you want a different size, state it on the commandline.

Comment by Jozef Behran [ 22/Oct/15 ]

Attachment multimount.sh has been added with description: Utility to mount and then unmount a bunch of devices

Comment by Jozef Behran [ 26/Oct/15 ]

The change https://git.opendaylight.org/gerrit/#/c/28709/ was giving some hope of improving matters but it turns out nothing was changed by it.

Comment by Jozef Behran [ 26/Oct/15 ]

UPDATE: It turned out the behavior of the bug is different than originally reported. The leak actually roughly depends on the count of devices currently mounted but the dependency is kind of "lagging" and also timing dependent, giving the illusion of a behavior described in the original report, especially when the mount-unmount operations are run in a quick succession. Additionally, it seems to me to be also time dependent (running mount-unmount multiple times on a quick succession tends to leak more memory than letting ODL to "cool down" after each mount-unmount pair for a few tens of seconds). Additionally, occassionally (very rarely) the leak is much bigger than what the count of mounted devices suggests (2-3 times larger) and occassionally (very rarely though more commonly than the previous case) there is no memory leak at all (but all I can confirm is that the memory was not growing at that time but it might be that the leak did not happen and it also might be that memory did leak but something unrelated was garbage collected, hiding the leak).

Comment by Tomas Cere [ 26/Nov/15 ]

Seems to only happen when configuring via restconf with config-subsystem loopback, with initial configs from testtool I can get 5k devices without any issues.
Most likely this is related to the way restconf works(read contents of entire list to check if configuration is present, and then insert config)

Comment by Jozef Behran [ 02/Dec/15 ]

> Most likely this is related to the way restconf works
> (read contents of entire list to check if configuration
> is present, and then insert config)

And stick a reference to that "entire list" somewhere where it stays reachable even after restconf is done with that "entire list" so that the garbage collector cannot collect the "entire list" and they keep accumulating.

Maybe memory profiling will give more answers to this.

Comment by Robert Varga [ 02/Dec/15 ]

Be: https://git.opendaylight.org/gerrit/30487

This affects Lithium, too, so should be backported.

Comment by Robert Varga [ 14/Dec/15 ]

Further fix: https://git.opendaylight.org/gerrit/#/c/31075/

Comment by Jozef Behran [ 08/Jan/16 ]

Tested and confirmed to be gone.

Comment by Jozef Behran [ 08/Jan/16 ]

These patches need to be merged to kill the bug (in the indicated order):

https://git.opendaylight.org/gerrit/31808
https://git.opendaylight.org/gerrit/31858

Comment by Robert Varga [ 08/Jan/16 ]

Li: https://git.opendaylight.org/gerrit/32277

Generated at Wed Feb 07 20:14:08 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.