<!-- 
RSS generated by JIRA (8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d) at Wed Feb 07 19:55:52 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>OpenDaylight JIRA</title>
    <link>https://jira.opendaylight.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>8.20.10</version>
        <build-number>820010</build-number>
        <build-date>22-06-2022</build-date>
    </build-info>


<item>
            <title>[CONTROLLER-1566] The mailbox of actor for data tree change listener should use limited-size queue</title>
                <link>https://jira.opendaylight.org/browse/CONTROLLER-1566</link>
                <project id="10113" key="CONTROLLER">controller</project>
                    <description>&lt;p&gt;We&apos;d met a problem of using up JVM heap while running OpenDaylight controller. The reason was that the consuming rate of data tree change listeners was much slower than the producing rate of shard data trees. At the same time, the mailbox of actor for data tree change listener uses unlimited-size queue. So the data tree change messages had cumulated infinitely until using up the JVM heap. &lt;/p&gt;

&lt;p&gt;After we limited the size of the mailbox queue, the JVM won&apos;t run into the mire of fully garbage-collecting of a used-up heap and kept stable no matter how slowly the data tree change listeners were consuming their data tree change messages. &lt;/p&gt;

&lt;p&gt;Additionally, we added a new metric for the data tree change listener actor to track the number of dropped messages due to the limited-size queue. Attachment is the screenshot of an example of such metric.&lt;/p&gt;</description>
                <environment>&lt;p&gt;Operating System: All&lt;br/&gt;
Platform: All&lt;/p&gt;</environment>
        <key id="26120">CONTROLLER-1566</key>
            <summary>The mailbox of actor for data tree change listener should use limited-size queue</summary>
                <type id="10104" iconUrl="https://jira.opendaylight.org/secure/viewavatar?size=xsmall&amp;avatarId=10303&amp;avatarType=issuetype">Bug</type>
                                                <status id="5" iconUrl="https://jira.opendaylight.org/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="green"/>
                                    <resolution id="10001">Won&apos;t Do</resolution>
                                        <assignee username="yin.kangqian@zte.com.cn">Kangqian Yin</assignee>
                                    <reporter username="yin.kangqian@zte.com.cn">Kangqian Yin</reporter>
                        <labels>
                    </labels>
                <created>Wed, 7 Dec 2016 07:12:49 +0000</created>
                <updated>Mon, 19 Dec 2016 02:33:01 +0000</updated>
                            <resolved>Mon, 19 Dec 2016 02:33:01 +0000</resolved>
                                    <version>0.4.0</version>
                                                    <component>clustering</component>
                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                                                                <comments>
                            <comment id="51622" author="yin.kangqian@zte.com.cn" created="Wed, 7 Dec 2016 07:12:49 +0000"  >&lt;p&gt;Attachment limited-length-mailbox-queue-not-handled.png has been added with description: New metric of &quot;not-handled&quot; for actors with mailbox type of &quot;bounded-mailbox&quot;&lt;/p&gt;</comment>
                            <comment id="51611" author="vrpolak" created="Fri, 9 Dec 2016 13:11:40 +0000"  >&lt;p&gt;&amp;gt; After we limited the size of the mailbox queue&lt;/p&gt;

&lt;p&gt;This sounds like you already have a fix ready. If you are willing to get this fix merged:&lt;br/&gt;
0. Click &quot;take&quot; in &quot;Assigned To&quot; field of this Bug and set &quot;Status&quot; to &quot;in progress&quot; to let other people know this Bug has a person dedicated to fixing it.&lt;br/&gt;
1. Upload your patch to Gerrit and add Controller committers as reviewers.&lt;br/&gt;
2. Post a comment here with the Gerrit link and set &quot;Status&quot; to &quot;waiting for review&quot;.&lt;br/&gt;
3. Work with reviewers to get you patch merged, then set status to &quot;fixed&quot;.&lt;/p&gt;

&lt;p&gt;There are more verbose guides in various subpages of &lt;a href=&quot;https://wiki.opendaylight.org/view/BestPractices&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.opendaylight.org/view/BestPractices&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="51612" author="tpantelis" created="Tue, 13 Dec 2016 03:12:06 +0000"  >&lt;p&gt;The problem with using a bounded mailbox is that change notifications will get dropped which likely isn&apos;t good either. You really should look into why the data tree change listener isn&apos;t keeping up and try to alleviate that (e.g. is it doing blocking operation(s) that could be done async?). Also maybe the front-end client(s) are producing transactions too fast.&lt;/p&gt;</comment>
                            <comment id="51613" author="vrpolak" created="Tue, 13 Dec 2016 15:45:53 +0000"  >&lt;p&gt;&amp;gt; dropped messages due to the limited-size queue&lt;/p&gt;

&lt;p&gt;Oh, I missed that. No application expects data tree changes to be dropped silently.&lt;/p&gt;

&lt;p&gt;Would it be possible to block the committing thread (instead of dropping message) until queue is writable? Or would that lead to deadlocks (or some other queue exploding)?&lt;/p&gt;</comment>
                            <comment id="51614" author="tpantelis" created="Tue, 13 Dec 2016 16:05:18 +0000"  >&lt;p&gt;(In reply to Vratko Pol&#225;k from comment #3)&lt;br/&gt;
&amp;gt; &amp;gt; dropped messages due to the limited-size queue&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; Oh, I missed that. No application expects data tree changes to be dropped&lt;br/&gt;
&amp;gt; silently.&lt;br/&gt;
&amp;gt; &lt;br/&gt;
&amp;gt; Would it be possible to block the committing thread (instead of dropping&lt;br/&gt;
&amp;gt; message) until queue is writable? Or would that lead to deadlocks (or some&lt;br/&gt;
&amp;gt; other queue exploding)?&lt;/p&gt;

&lt;p&gt;Back pressure is the ideal way to do it but in this case the committing thread would block the shard from processing subsequent messages which isn&apos;t ideal. One slow listener could block raft heartbeats and subsequent transactions.&lt;/p&gt;

&lt;p&gt;If a listener may do expensive processing or block I think it should offload that from the listener notification thread. If the queue or thread pool gets inundated then the listener logic can decide how best to handle that.&lt;/p&gt;</comment>
                            <comment id="51615" author="yin.kangqian@zte.com.cn" created="Wed, 14 Dec 2016 02:32:44 +0000"  >&lt;p&gt;Just as you said, It&apos;s unacceptable to drop data tree change messages silenty when the message queue is fulfilled. So I would add a metric to record the counter of dropped messages by overriding the class MeteredBoundedMailbox.MeteredMessageQueue. The application can monitor this metric to see whether their listners works slowly or loses some critical data tree change messages.&lt;/p&gt;

&lt;p&gt;And by means of this metric, we have found our bgp listeners consume data tree changes slowly than the production from the processing of bgp packets. We&apos;re now tring to offload the changes to a new queue and consolidate the changes.&lt;/p&gt;

&lt;p&gt;The disadvantage of limiting the queue size I&apos;ve met is that it will block the shard actor for a duration of timeout when the shard tries to tell a data tree change message to a fulfilled message queue. That&apos;ll decrease throughput of odl&apos;s distributed datastore. However this is a way of back-pressure to the producers.&lt;/p&gt;</comment>
                            <comment id="51616" author="rovarga" created="Fri, 16 Dec 2016 13:00:00 +0000"  >&lt;p&gt;This is a more complicated problem. The first it the transport part and is the inverse of BUG-5280, i.e. solution can use the same mechanism.&lt;/p&gt;

&lt;p&gt;As for the backpressure problem, we cannot allow the system to grind to a halt simply because a bug in some obscure application. To that effect we have introduced DOMDataTreeListener, which has two things going for it:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;proper error callback, i.e. &quot;you cannot keep up, I killed you, restart if you wish&quot;&lt;/li&gt;
	&lt;li&gt;ability to allow notifications to be state-compressed, hence doing a similar thing PingPongDataBroker does&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;These three combined can be used to:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;not overflow actor inbox (by exerting backpressure)&lt;/li&gt;
	&lt;li&gt;keep the number of notifications low (state-compress them when they start piling up)&lt;/li&gt;
	&lt;li&gt;if all else fails kill the listener&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="51617" author="rovarga" created="Fri, 16 Dec 2016 13:26:05 +0000"  >&lt;p&gt;A further note: Data&lt;/p&gt;
{Tree}
&lt;p&gt;ChangeListeners API contract does not allow a message to be dropped, as that amounts to data corruption.&lt;/p&gt;</comment>
                            <comment id="51618" author="yin.kangqian@zte.com.cn" created="Sat, 17 Dec 2016 02:54:09 +0000"  >&lt;p&gt;You&apos;re right. Data corruption in application is unavoidable if the application consumes messages more slowly than the message producing from the distributed datastore. &lt;/p&gt;

&lt;p&gt;After we limited the listener queue length, the bgp listeners cannot work correctly for they lost many data. &lt;/p&gt;

&lt;p&gt;However, this will force such applications as bgp listeners to find the way to catch up with the producer. This&apos;s a good thing for it will improve the quality and throughput of whole system.&lt;/p&gt;

&lt;p&gt;Regarding the complexity of applications, I don&apos;t think the distributed datastore can achieve the DataTreeChangeListener contract of no message lost or no data corruption.&lt;/p&gt;

&lt;p&gt;The reasonable thing the distributed datastore can do on this issue is to keep more messages as possible as the JVM heap can allow rather than a fixed number of messages.&lt;/p&gt;</comment>
                            <comment id="51619" author="yin.kangqian@zte.com.cn" created="Sat, 17 Dec 2016 10:10:23 +0000"  >&lt;p&gt;&lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/49518/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/49518/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="51620" author="rovarga" created="Sat, 17 Dec 2016 14:31:09 +0000"  >&lt;p&gt;Pushing the problem out to applications is unacceptable, simply because we do not give them the tools to solve the problem and it will lead to massive amounts of code duplication in the application layer, with the various workarounds having different kinds of bugs.&lt;/p&gt;

&lt;p&gt;To illustrate on the specific example of BGP listeners, how do you propose they solve the problem, when they are completely unaware of the fact that the implementation has chosen to break the DTCN stream and hence the sum of observed state no longer matches what is in the data store?&lt;/p&gt;

&lt;p&gt;I do not agree with your assessment of not being unable to fulfill a reasonable interface contract (where DOMDataTreeListener is reasonable). I believe we have all the tools we need to get the job done.&lt;/p&gt;

&lt;p&gt;Is your assessment based on the CDS architecture or its implementation deficiencies?&lt;/p&gt;</comment>
                            <comment id="51621" author="yin.kangqian@zte.com.cn" created="Mon, 19 Dec 2016 02:06:02 +0000"  >&lt;p&gt;Hi, Robert, What&apos;s CDS architecture? My assessment of bgp listeners comes from the performance testing of our bgp application customized from ODL&apos;s bgpcep project.&lt;/p&gt;

&lt;p&gt;Currently, I don&apos;t have time and ability to implement the much better solution as you said in &quot;Comment 6&quot;.&lt;/p&gt;

&lt;p&gt;However, I look forward to seeing you or Tom can implement that solution to avoid using up JVM heap.&lt;/p&gt;

&lt;p&gt;Before we can use that solution, I have to limit the queue size because using up JVM heap is a much more dangerous thing. We&apos;ve suffered it once, and we won&apos;t suffer it anymore.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="13597" name="limited-length-mailbox-queue-not-handled.png" size="69419" author="yin.kangqian@zte.com.cn" created="Wed, 7 Dec 2016 07:12:49 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                            <customfield id="customfield_11400" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10208" key="com.atlassian.jira.plugin.system.customfieldtypes:textfield">
                        <customfieldname>External issue ID</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>7314</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10201" key="com.atlassian.jira.plugin.system.customfieldtypes:url">
                        <customfieldname>External issue URL</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[https://bugs.opendaylight.org/show_bug.cgi?id=7314]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10000" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|i02rd3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>