<!-- 
RSS generated by JIRA (8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d) at Wed Feb 07 19:54:57 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>OpenDaylight JIRA</title>
    <link>https://jira.opendaylight.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>8.20.10</version>
        <build-number>820010</build-number>
        <build-date>22-06-2022</build-date>
    </build-info>


<item>
            <title>[CONTROLLER-1209] Clustering: Recovering a large journal runs out of memory</title>
                <link>https://jira.opendaylight.org/browse/CONTROLLER-1209</link>
                <project id="10113" key="CONTROLLER">controller</project>
                    <description>&lt;p&gt;I created 1 million data objects (cars) with 50 concurrent transactions each writing 20K entries in the CDS (single-node) which worked fine - memory usage stayed under 0.5G with a max of 2G. So the in-memory journal log trimming did its job. It didn&apos;t snapshot b/c it never reached 20K journal entries (due to the batched modifications) and didn&apos;t trip the 70% memory threshold.&lt;/p&gt;

&lt;p&gt;However, when I restarted the controller it ran out of memory trying to recover the journal:&lt;/p&gt;

&lt;p&gt;Uncaught error from thread &lt;span class=&quot;error&quot;&gt;&amp;#91;opendaylight-cluster-data-akka.actor.default-dispatcher-18&amp;#93;&lt;/span&gt; shutting down JVM since &apos;akka.jvm-exit-on-fatal-error&apos; is enabled for ActorSystem[opendaylight-cluster-dataUncaught error from thread &lt;span class=&quot;error&quot;&gt;&amp;#91;opendaylight-cluster-data-akka.actor.default-dispatcher-24&amp;#93;&lt;/span&gt; shutting down JVM since &apos;akka.jvm-exit-on-fatal-error&apos; is enabled for ActorSystem&lt;span class=&quot;error&quot;&gt;&amp;#91;opendaylight-cluster-data&amp;#93;&lt;/span&gt;&lt;br/&gt;
Uncaught error from thread [java.lang.OutOfMemoryError: GC overhead limit exceeded&lt;br/&gt;
]&lt;br/&gt;
java.lang.OutOfMemoryError: GC overhead limit exceeded&lt;br/&gt;
        at java.lang.StringBuilder.toString(StringBuilder.java:405)&lt;br/&gt;
        at org.opendaylight.controller.cluster.datastore.node.utils.stream.NormalizedNodeInputStreamReader.readQName(NormalizedNodeInputStreamReader.java:223)&lt;br/&gt;
        at org.opendaylight.controller.cluster.datastore.node.utils.stream.NormalizedNodeInputStreamReader.readPathArgument(NormalizedNodeInputStreamReader.java:350)&lt;br/&gt;
        at org.opendaylight.controller.cluster.datastore.node.utils.stream.NormalizedNodeInputStreamReader.readYangInstanceIdentifierInternal(NormalizedNodeInputStreamReader.java:326)&lt;br/&gt;
        at org.opendaylight.controller.cluster.datastore.node.utils.stream.NormalizedNodeInputStreamReader.readYangInstanceIdentifier(NormalizedNodeInputStreamReader.java:317)&lt;br/&gt;
        at org.opendaylight.controller.cluster.datastore.utils.SerializationUtils.deserializePathAndNode(SerializationUtils.java:76)&lt;br/&gt;
        at org.opendaylight.controller.cluster.datastore.modification.WriteModification.readExternal(WriteModification.java:62)&lt;br/&gt;
        at org.opendaylight.controller.cluster.datastore.modification.MergeModification.fromStream(MergeModification.java:59)&lt;br/&gt;
        at org.opendaylight.controller.cluster.datastore.modification.MutableCompositeModification.readExternal(MutableCompositeModification.java:95)&lt;br/&gt;
        at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)&lt;br/&gt;
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)&lt;br/&gt;
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)&lt;br/&gt;
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)&lt;br/&gt;
        at org.opendaylight.controller.cluster.datastore.modification.ModificationPayload.getModification(ModificationPayload.java:46)&lt;br/&gt;
        at org.opendaylight.controller.cluster.datastore.Shard.appendRecoveredLogEntry(Shard.java:726)&lt;br/&gt;
        at org.opendaylight.controller.cluster.raft.RaftActor.batchRecoveredLogEntry(RaftActor.java:262)&lt;br/&gt;
        at org.opendaylight.controller.cluster.raft.RaftActor.onRecoveredApplyLogEntries(RaftActor.java:247)&lt;br/&gt;
        at org.opendaylight.controller.cluster.raft.RaftActor.handleRecover(RaftActor.java:178)&lt;br/&gt;
        at org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveRecover(AbstractUntypedPersistentActor.java:51)&lt;br/&gt;
        at org.opendaylight.controller.cluster.datastore.Shard.onReceiveRecover(Shard.java:245)&lt;br/&gt;
        at akka.persistence.UntypedPersistentActor$$anonfun$receiveRecover$1.applyOrElse(Eventsourced.scala:433)&lt;/p&gt;

&lt;p&gt;I increased the max JVM memory to 4G and still ran out of memory.&lt;/p&gt;

&lt;p&gt;Looking at the heap dump, the currentLogRecoveryBatch retained over 2G as did the ReplicatedLog journal (although they share the MutableCompositeModification instances). Both had 126 entries with each MutableCompositeModification instance retaining ~21M so it didn&apos;t get very far. &lt;/p&gt;

&lt;p&gt;We keep applying entries to the Shard&apos;s currentLogRecoveryBatch until we reach the shardJournalRecoveryLogBatchSize, at which point we queue up a transaction. One thing I noticed is that we don&apos;t clear the currentLogRecoveryBatch when we end a batch - this is a bug. However we never hit the end of a batch before running out of memory b/c shardJournalRecoveryLogBatchSize is 5000 by default. &lt;/p&gt;

&lt;p&gt;I dropped it to 10 and it got farther (136 entries) but still ran out of memory. Now the ShardRecoveryCoordinator retains ~2G as it captures the current currentLogRecoveryBatch. The ReplicatedLog only retains ~467M b/c it stores  serialized ModificationPayload instances which are much smaller, ~2M.&lt;/p&gt;

&lt;p&gt;It seems we need also to trim the in-memory log as we&apos;re recovering the log. We trim the log as transactions are committed live. In fact, for single-node we don&apos;t even retain the in-memory log. &lt;/p&gt;

&lt;p&gt;I think there&apos;s several things we need to address here. Here&apos;s initial thoughts:&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Null out currentLogRecoveryBatch in applyCurrentLogRecoveryBatch after submitting to the ShardRecoveryCoordinator.&lt;/li&gt;
&lt;/ul&gt;


&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Take into account the memory usage when batching recovered journal entries.&lt;/li&gt;
&lt;/ul&gt;


&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Store the ModificationPayload instances in currentLogRecoveryBatch instead of the de-serialized instances. De-serialize when applying to the transaction.&lt;/li&gt;
&lt;/ul&gt;


&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;The ShardRecoveryCoordinator uses an executor to prepare write transactions in parallel and then commits them all on recovery complete. Do we need to do this? This requires significant memory overhead when recovering a large journal - may not scale well. It&apos;s probably OK to commit immediately.&lt;/li&gt;
&lt;/ul&gt;


&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Lower the default shardJournalRecoveryLogBatchSize (eg 100).&lt;/li&gt;
&lt;/ul&gt;


&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Trim the in-memory log as we&apos;re recovering. For single-node this is easy. For a cluster we&apos;d need to take into account the previous replicatedToAllIndex so it would have to be persisted. ApplyJournalEntries seems like a good candidate.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;I can look at this more when I get back.&lt;/p&gt;</description>
                <environment>&lt;p&gt;Operating System: All&lt;br/&gt;
Platform: All&lt;/p&gt;</environment>
        <key id="25763">CONTROLLER-1209</key>
            <summary>Clustering: Recovering a large journal runs out of memory</summary>
                <type id="10104" iconUrl="https://jira.opendaylight.org/secure/viewavatar?size=xsmall&amp;avatarId=10303&amp;avatarType=issuetype">Bug</type>
                                                <status id="5" iconUrl="https://jira.opendaylight.org/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="green"/>
                                    <resolution id="10000">Done</resolution>
                                        <assignee username="tpantelis">Tom Pantelis</assignee>
                                    <reporter username="tpantelis">Tom Pantelis</reporter>
                        <labels>
                    </labels>
                <created>Sun, 15 Mar 2015 15:36:19 +0000</created>
                <updated>Tue, 25 Jul 2023 08:23:57 +0000</updated>
                            <resolved>Tue, 31 Mar 2015 14:46:11 +0000</resolved>
                                                                    <component>mdsal</component>
                        <due></due>
                            <votes>0</votes>
                                    <watches>1</watches>
                                                                                                                <comments>
                            <comment id="50271" author="tpantelis" created="Sun, 15 Mar 2015 16:47:55 +0000"  >&lt;p&gt;The main issues here are the first and third bullet items above. If we store the serialized ModificationPayload instances, which are the same instances stored in the ReplicatedLog, in currentLogRecoveryBatch list then it&apos;s not using anymore memory than the ReplicatedLog.&lt;/p&gt;

&lt;p&gt;However I think we&apos;ll still run into a similar problem with caching all the prepared DOM write transactions in the ShardRecoveryCoordinator until onRecoveryComplete. I think we need to immediately prepare and commit each batched transaction to free up the memory.&lt;/p&gt;</comment>
                            <comment id="50272" author="tpantelis" created="Sun, 15 Mar 2015 18:34:09 +0000"  >&lt;p&gt;I made the first 2 changes below and recovery completed successfully. There were 306 log entries each with 20K merge mods (one for each car) so there were actually over 6M car objects total (I had run my stress test multiple times) in the persisted journal although only the last 1M were current (the test overwrites the cars container). This equates to about 612M used by the ReplicatedLog. All of the merge mods were committed in one huge transaction and it took about 1.3 minutes to complete. Memory spiked over 2.5G and eventually settled to ~2G.&lt;/p&gt;

&lt;p&gt;I then changed the ShardRecoveryCoordinator to commit each log entries batch immediately and lowered the shardJournalRecoveryLogBatchSize to 100 so all entries would be committed in 4 transactions. This took a little over a minute. Memory profile was about the same.&lt;/p&gt;

&lt;p&gt;So trimming the in-memory log on recovery would eliminate the 612M used by the ReplicatedLog.&lt;/p&gt;

&lt;p&gt;It seems like it should&apos;ve snapshotted though when it originally ran with over 600M in the log. With single-node (no followers), we keep track of the dataSizeSinceLastSnapshot however we divide it by 5 for the memory check. This is a hard-coded heuristic so we don&apos;t snapshot too often as we really don&apos;t need it to trim the in-memory log. However this prevents trimming the persisted log. I can try it without dividing by 5 to see at what point it would take a snapshot.&lt;/p&gt;

&lt;p&gt;(In reply to Tom Pantelis from comment #1)&lt;br/&gt;
&amp;gt; The main issues here are the first and third bullet items above. If we store&lt;br/&gt;
&amp;gt; the serialized ModificationPayload instances, which are the same instances&lt;br/&gt;
&amp;gt; stored in the ReplicatedLog, in currentLogRecoveryBatch list then it&apos;s not&lt;br/&gt;
&amp;gt; using anymore memory than the ReplicatedLog.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; However I think we&apos;ll still run into a similar problem with caching all the&lt;br/&gt;
&amp;gt; prepared DOM write transactions in the ShardRecoveryCoordinator until&lt;br/&gt;
&amp;gt; onRecoveryComplete. I think we need to immediately prepare and commit each&lt;br/&gt;
&amp;gt; batched transaction to free up the memory.&lt;/p&gt;</comment>
                            <comment id="50273" author="moraja@cisco.com" created="Tue, 31 Mar 2015 14:46:11 +0000"  >&lt;p&gt;&lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/16540/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/16540/&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                            <customfield id="customfield_11400" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10208" key="com.atlassian.jira.plugin.system.customfieldtypes:textfield">
                        <customfieldname>External issue ID</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2847</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10201" key="com.atlassian.jira.plugin.system.customfieldtypes:url">
                        <customfieldname>External issue URL</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[https://bugs.opendaylight.org/show_bug.cgi?id=2847]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10206" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Issue Type</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10300"><![CDATA[Bug]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10204" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>ODL SR Target Milestone</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10315"><![CDATA[Lithium]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10000" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|i02p5r:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>