<!-- 
RSS generated by JIRA (8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d) at Wed Feb 07 19:54:54 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>OpenDaylight JIRA</title>
    <link>https://jira.opendaylight.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>8.20.10</version>
        <build-number>820010</build-number>
        <build-date>22-06-2022</build-date>
    </build-info>


<item>
            <title>[CONTROLLER-1186] Clustering : ConcurrentDOMDataBroker/DOMConcurrentDataCommitCoordinator cause deadlocks</title>
                <link>https://jira.opendaylight.org/browse/CONTROLLER-1186</link>
                <project id="10113" key="CONTROLLER">controller</project>
                    <description>&lt;p&gt;Here is the scenario,&lt;/p&gt;

&lt;p&gt;T1 and T2 are two threads which create Txn1 and Txn2. Both these transactions touch both config/operational datastores. When Txn1 and Txn2 are submitted they therefore have 2 cohorts each. On the CDS side we end up with 1 Txn for config and 1 Txn for operational per Broker transaction each with 1 cohort. Let&apos;s call the cohorts Txn11, Txn12, Txn21, Txn22&lt;/p&gt;

&lt;p&gt;When Txn1 and Txn2 are submitted the concurrent broker attempts to do a canCommit on all the cohorts of Txn1 and Txn2 concurrently. On the CDS side we therefore try to do canCommit on the cohorts for all 4 Transactions. &lt;/p&gt;

&lt;p&gt;When we do a canCommit on a CDS transaction the cohortEntry on which the canCommit is triggered gets queued on the ShardCommitCoordinator. This cohortEntry is then removed when a commit is called on that cohort. On the DataBroker end preCommit will only be called on the cohorts when the response for both cohort canCommits are received. &lt;/p&gt;

&lt;p&gt;Because canCommit is triggered asynchronously it just so happens that on the ShardCommitCoordinator the cohort entries get queued as follows,&lt;/p&gt;

&lt;p&gt;Txn21 -&amp;gt; Txn11 = Operational Shard Coordinator&lt;/p&gt;

&lt;p&gt;Txn12 -&amp;gt; Txn22 = Config Shard Coordinator&lt;/p&gt;


&lt;p&gt;Operational ShardCommitCoordinator process the first item in it&apos;s queue and sends a CanCommitTransactionReply for Txn21. Similarly Config ShardCommitCoordinator sends the CanCommitTransactionReply for Txn12. &lt;/p&gt;

&lt;p&gt;Txn21 and Txn12 can be removed from the queue only when a commit arrives for those cohorts but that will never happen because that would require that a canCommit response be received for cohorts Txn11 and Txn22. Thus the deadlock.&lt;/p&gt;

&lt;p&gt;In CDS this deadlock eventually resolves with akka timeouts. &lt;/p&gt;

&lt;p&gt;One way to reproduce this problem is by using the netconf scale test.&lt;/p&gt;

&lt;p&gt;The manual instructions for running this test are here,&lt;/p&gt;


&lt;p&gt;ODL with odl-restconf and odl-netconf-connector-all installed&lt;/p&gt;

&lt;p&gt;Download latest testool: &lt;br/&gt;
&lt;a href=&quot;https://nexus.opendaylight.org/service/local/artifact/maven/redirect?r=opendaylight.snapshot&amp;amp;g=org.opendaylight.controller&amp;amp;a=netconf-testtool&amp;amp;v=0.3.0-SNAPSHOT&amp;amp;e=jar&amp;amp;c=executable&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://nexus.opendaylight.org/service/local/artifact/maven/redirect?r=opendaylight.snapshot&amp;amp;g=org.opendaylight.controller&amp;amp;a=netconf-testtool&amp;amp;v=0.3.0-SNAPSHOT&amp;amp;e=jar&amp;amp;c=executable&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;run testtool with(dont just copy paste run it, make sure all the settings are correct according to the instructions below):&lt;/p&gt;

&lt;p&gt;java -Dorg.apache.sshd.registerBouncyCastle=false -Xmx8G -XX:MaxPermSize=1G -jar ./netconf-testtool-0.3.0-SNAPSHOT-executable.jar --ssh true --generate-configs-batch-size 4000 --exi false --generate-config-connection-timeout 10000000 --generate-config-address 127.0.0.1 --device-count 10000 --distribution-folder ~/data/odl/distribution-karaf-0.3.0-SNAPSHOT/ --starting-port 17830 --schemas-dir ~/data/odl/yang --debug false&lt;/p&gt;

&lt;p&gt;for the argument explanation consult the wiki:&lt;br/&gt;
&lt;a href=&quot;https://wiki.opendaylight.org/index.php?title=OpenDaylight_Controller:Netconf:Testtool#Testtool_help&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.opendaylight.org/index.php?title=OpenDaylight_Controller:Netconf:Testtool#Testtool_help&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;for now you only need to set the distribution-folder to the distribution that you are going to run and schemas-dir to the dir with yang schemas you want the simulated devices to use. If you dont want to use any extra schemas just remove the argument.&lt;/p&gt;

&lt;p&gt;after testtool is done rewriting the configs in the distro folder you can run karaf.&lt;br/&gt;
Make sure karaf can run with more than 2GB of ram since with extra schemas or more features installed 2gb of ram is not sufficient for 10k devices.&lt;/p&gt;

&lt;p&gt;you can monitor progress on RESTCONF:&lt;br/&gt;
&lt;a href=&quot;http://locahost:8181/restconf/operational/opendaylight-inventory:nodes/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://locahost:8181/restconf/operational/opendaylight-inventory:nodes/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;when a device is connected it has a &amp;lt;connected&amp;gt;true&amp;lt;/connected&amp;gt; attribute in the restconf response.&lt;/p&gt;


&lt;p&gt;NOTE : In Helium if you disable the concurrent commits for DistributedDataStore this problem disappears because the of the sequential coordinator.&lt;/p&gt;</description>
                <environment>&lt;p&gt;Operating System: All&lt;br/&gt;
Platform: All&lt;/p&gt;</environment>
        <key id="25740">CONTROLLER-1186</key>
            <summary>Clustering : ConcurrentDOMDataBroker/DOMConcurrentDataCommitCoordinator cause deadlocks</summary>
                <type id="10104" iconUrl="https://jira.opendaylight.org/secure/viewavatar?size=xsmall&amp;avatarId=10303&amp;avatarType=issuetype">Bug</type>
                                                <status id="5" iconUrl="https://jira.opendaylight.org/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="green"/>
                                    <resolution id="10000">Done</resolution>
                                        <assignee username="moraja@cisco.com">Moiz Raja</assignee>
                                    <reporter username="moraja@cisco.com">Moiz Raja</reporter>
                        <labels>
                    </labels>
                <created>Fri, 6 Mar 2015 07:23:42 +0000</created>
                <updated>Mon, 30 Mar 2015 21:15:11 +0000</updated>
                            <resolved>Mon, 30 Mar 2015 21:15:11 +0000</resolved>
                                    <version>Post-Helium</version>
                                                    <component>mdsal</component>
                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                                                                <comments>
                            <comment id="50202" author="moraja@cisco.com" created="Wed, 25 Mar 2015 23:31:49 +0000"  >&lt;p&gt;&lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/17136/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/17136/&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                            <customfield id="customfield_11400" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10208" key="com.atlassian.jira.plugin.system.customfieldtypes:textfield">
                        <customfieldname>External issue ID</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2792</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10201" key="com.atlassian.jira.plugin.system.customfieldtypes:url">
                        <customfieldname>External issue URL</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[https://bugs.opendaylight.org/show_bug.cgi?id=2792]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10206" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Issue Type</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10300"><![CDATA[Bug]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10204" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>ODL SR Target Milestone</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10315"><![CDATA[Lithium]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10202" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Priority</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10313"><![CDATA[Highest]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10000" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|i02p0n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>