<!-- 
RSS generated by JIRA (8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d) at Wed Feb 07 19:56:20 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>OpenDaylight JIRA</title>
    <link>https://jira.opendaylight.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>8.20.10</version>
        <build-number>820010</build-number>
        <build-date>22-06-2022</build-date>
    </build-info>


<item>
            <title>[CONTROLLER-1745] produce-transactions can get stuck when closing itemProducer</title>
                <link>https://jira.opendaylight.org/browse/CONTROLLER-1745</link>
                <project id="10113" key="CONTROLLER">controller</project>
                    <description>&lt;p&gt;This is a fork of &lt;a href=&quot;https://jira.opendaylight.org/browse/CONTROLLER-1687&quot; title=&quot;Transaction writer does not fail expected time after isolation&quot; class=&quot;issue-link&quot; data-issue-key=&quot;CONTROLLER-1687&quot;&gt;&lt;del&gt;CONTROLLER-1687&lt;/del&gt;&lt;/a&gt; as new investigation suggests the cause of this Bug is different than the previously fixed &lt;a href=&quot;https://jira.opendaylight.org/browse/CONTROLLER-1687&quot; title=&quot;Transaction writer does not fail expected time after isolation&quot; class=&quot;issue-link&quot; data-issue-key=&quot;CONTROLLER-1687&quot;&gt;&lt;del&gt;CONTROLLER-1687&lt;/del&gt;&lt;/a&gt; cause.&lt;/p&gt;

&lt;p&gt;Filling against Controller/clustering, even though the guilty project might be Mdsal instead.&lt;/p&gt;

&lt;p&gt;This affects one CSIT test case &lt;span class=&quot;error&quot;&gt;&amp;#91;34&amp;#93;&lt;/span&gt;, but the probability of failure is less than 10%.&lt;/p&gt;

&lt;p&gt;The test calls write-transactions on a shard follower, then isolates that member, then waits some time (115s) expecting no failure, and then some time (20 s) expecting a failure (from RequestTimeoutException). If this Bug appears, write-transactions does not return anything in withing reasonable time.&lt;/p&gt;

&lt;p&gt;Copied from &lt;a href=&quot;https://jira.opendaylight.org/browse/CONTROLLER-1687&quot; title=&quot;Transaction writer does not fail expected time after isolation&quot; class=&quot;issue-link&quot; data-issue-key=&quot;CONTROLLER-1687&quot;&gt;&lt;del&gt;CONTROLLER-1687&lt;/del&gt;&lt;/a&gt; comments:&lt;/p&gt;

&lt;p&gt;&amp;gt; I will try to reproduce with more logging.&lt;/p&gt;

&lt;p&gt;Reproduced. This time the karaf.log &lt;span class=&quot;error&quot;&gt;&amp;#91;39&amp;#93;&lt;/span&gt; shows that the progress stops when closing itemProducer. Will try to repeat with even more logs &lt;span class=&quot;error&quot;&gt;&amp;#91;40&amp;#93;&lt;/span&gt;.&lt;/p&gt;

&lt;p&gt;2017-07-26 17:28:10,129 | TRACE | pool-31-thread-2 | ProduceTransactionsHandler       | 257 - org.opendaylight.controller.samples.clustering-it-provider - 1.5.2.SNAPSHOT | org.opendaylight.controller.clustering.it.provider.impl.ProduceTransactionsHandler&lt;br/&gt;
@30b30b0 Entering runFailed.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;34&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/794/log.html.gz#s1-s34-t7-k2-k16-k1-k1&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/794/log.html.gz#s1-s34-t7-k2-k16-k1-k1&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;39&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-ci-only-carbon/18/odl1_karaf.log.gz&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-ci-only-carbon/18/odl1_karaf.log.gz&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;40&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://git.opendaylight.org/gerrit/#/c/60783/1&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/#/c/60783/1&lt;/a&gt;&lt;/p&gt;</description>
                <environment>&lt;p&gt;Operating System: All&lt;br/&gt;
Platform: All&lt;/p&gt;</environment>
        <key id="26299">CONTROLLER-1745</key>
            <summary>produce-transactions can get stuck when closing itemProducer</summary>
                <type id="10104" iconUrl="https://jira.opendaylight.org/secure/viewavatar?size=xsmall&amp;avatarId=10303&amp;avatarType=issuetype">Bug</type>
                                                <status id="5" iconUrl="https://jira.opendaylight.org/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="green"/>
                                    <resolution id="10000">Done</resolution>
                                        <assignee username="rovarga">Robert Varga</assignee>
                                    <reporter username="vrpolak">Vratko Polak</reporter>
                        <labels>
                    </labels>
                <created>Thu, 27 Jul 2017 09:44:20 +0000</created>
                <updated>Tue, 25 Jul 2023 08:24:45 +0000</updated>
                            <resolved>Thu, 24 Aug 2017 12:36:31 +0000</resolved>
                                                                    <component>clustering</component>
                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                                                                <comments>
                            <comment id="52526" author="vrpolak" created="Fri, 28 Jul 2017 17:11:01 +0000"  >&lt;p&gt;&amp;gt; Will try to repeat with even more logs &lt;span class=&quot;error&quot;&gt;&amp;#91;40&amp;#93;&lt;/span&gt;.&lt;/p&gt;

&lt;p&gt;Finaly got that repeated.&lt;br/&gt;
The big log &lt;span class=&quot;error&quot;&gt;&amp;#91;41&amp;#93;&lt;/span&gt; will get deleted in half a day, so I am going to paste long segments here.&lt;/p&gt;


&lt;p&gt;First, here is an example of transaction submit going right:&lt;/p&gt;

&lt;p&gt;2017-07-28 14:10:35,348 | DEBUG | pool-30-thread-1 | ProduceTransactionsHandler       | 257 - org.opendaylight.control&lt;br/&gt;
ler.samples.clustering-it-provider - 1.5.2.SNAPSHOT | Inserting item: 88373&lt;br/&gt;
2017-07-28 14:10:35,348 | DEBUG | pool-30-thread-1 | ProxyHistory                     | 199 - org.opendaylight.control&lt;br/&gt;
ler.sal-distributed-datastore - 1.5.2.SNAPSHOT | Allocated proxy member-2-datastore-Shard-id-ints!&lt;del&gt;fe-0-chn-5-txn-613&lt;/del&gt;&lt;br/&gt;
0 for transaction member-2-datastore-Shard-id-ints!-fe-0-chn-5-txn-613-0&lt;br/&gt;
2017-07-28 14:10:35,348 | DEBUG | pool-30-thread-1 | AbstractProxyTransaction         | 199 - org.opendaylight.control&lt;br/&gt;
ler.sal-distributed-datastore - 1.5.2.SNAPSHOT | Transaction RemoteProxyTransaction&lt;/p&gt;
{identifier=member-2-datastore-Shar
d-id-ints!-fe-0-chn-5-txn-613-0, state=OPEN} allocated sequence 0&lt;br/&gt;
2017-07-28 14:10:35,348 | TRACE | pool-30-thread-1 | ShardedDOMDataTreeProducer       | 147 - org.opendaylight.mdsal.d&lt;br/&gt;
om-broker - 2.2.2.SNAPSHOT | org.opendaylight.mdsal.dom.broker.ShardedDOMDataTreeProducer@4884f6f8 synchronizing for n&lt;br/&gt;
ull lastTx in transactionSubmitted&lt;br/&gt;
2017-07-28 14:10:35,348 | TRACE | pool-30-thread-1 | ShardedDOMDataTreeProducer       | 147 - org.opendaylight.mdsal.d&lt;br/&gt;
om-broker - 2.2.2.SNAPSHOT | org.opendaylight.mdsal.dom.broker.ShardedDOMDataTreeProducer@4884f6f8 synchronized in tra&lt;br/&gt;
nsactionSubmitted, submitting&lt;br/&gt;
2017-07-28 14:10:35,348 | DEBUG | pool-30-thread-1 | ardedDOMDataTreeWriteTransaction | 147 - org.opendaylight.mdsal.d&lt;br/&gt;
om-broker - 2.2.2.SNAPSHOT | Readying tx SHARDED-DOM-613&lt;br/&gt;
2017-07-28 14:10:35,348 | DEBUG | pool-30-thread-1 | AbstractProxyTransaction         | 199 - org.opendaylight.control&lt;br/&gt;
ler.sal-distributed-datastore - 1.5.2.SNAPSHOT | Transaction proxy RemoteProxyTransaction{identifier=member-2-datastore-Shard-id-ints!-fe-0-chn-5-txn-613-0, state=OPEN}
&lt;p&gt; sending request ModifyTransactionRequest&lt;/p&gt;
{target=member-2-datastore-
Shard-id-ints!-fe-0-chn-5-txn-613-0, sequence=0, replyTo=Actor[akka://opendaylight-cluster-data/user/$m#-2004618990], 
modifications=1, protocol=READY}
&lt;p&gt; callback org.opendaylight.controller.cluster.databroker.actors.dds.RemoteProxyTransac&lt;br/&gt;
tion$$Lambda$141/838048079@50466bcd&lt;br/&gt;
2017-07-28 14:10:35,348 | DEBUG | pool-30-thread-1 | TransmitQueue                    | 197 - org.opendaylight.control&lt;br/&gt;
ler.cds-access-client - 1.1.2.SNAPSHOT | Queue org.opendaylight.controller.cluster.access.client.TransmitQueue$Transmi&lt;br/&gt;
tting@130ebfcf transmitting entry ConnectionEntry{request=ModifyTransactionRequest&lt;/p&gt;
{target=member-2-datastore-Shard-id-
ints!-fe-0-chn-5-txn-613-0, sequence=0, replyTo=Actor[akka://opendaylight-cluster-data/user/$m#-2004618990], modificat
ions=1, protocol=READY}
&lt;p&gt;, enqueuedTicks=503857260922}&lt;br/&gt;
2017-07-28 14:10:35,348 | DEBUG | pool-30-thread-1 | AbstractClientHistory            | 199 - org.opendaylight.control&lt;br/&gt;
ler.sal-distributed-datastore - 1.5.2.SNAPSHOT | Client history org.opendaylight.controller.cluster.databroker.actors.&lt;br/&gt;
dds.ClientLocalHistory@5f1578c5 changed state from TX_OPEN to IDLE&lt;br/&gt;
2017-07-28 14:10:35,348 | DEBUG | pool-30-thread-1 | AbstractClientHistory            | 199 - org.opendaylight.control&lt;br/&gt;
ler.sal-distributed-datastore - 1.5.2.SNAPSHOT | Local history org.opendaylight.controller.cluster.databroker.actors.d&lt;br/&gt;
ds.ClientLocalHistory@5f1578c5 readied transaction member-2-datastore-Shard-id-ints!-fe-0-chn-5-txn-613-0&lt;br/&gt;
2017-07-28 14:10:35,348 | DEBUG | pool-30-thread-1 | AbstractProxyTransaction         | 199 - org.opendaylight.control&lt;br/&gt;
ler.sal-distributed-datastore - 1.5.2.SNAPSHOT | Transaction RemoteProxyTransaction&lt;/p&gt;
{identifier=member-2-datastore-Shar
d-id-ints!-fe-0-chn-5-txn-613-0, state=FLUSHED} allocated sequence 1&lt;br/&gt;
2017-07-28 14:10:35,348 | DEBUG | pool-30-thread-1 | AbstractProxyTransaction         | 199 - org.opendaylight.control&lt;br/&gt;
ler.sal-distributed-datastore - 1.5.2.SNAPSHOT | Transaction proxy RemoteProxyTransaction{identifier=member-2-datastore-Shard-id-ints!-fe-0-chn-5-txn-613-0, state=FLUSHED}
&lt;p&gt; sending request ModifyTransactionRequest&lt;/p&gt;
{target=member-2-datasto
re-Shard-id-ints!-fe-0-chn-5-txn-613-0, sequence=1, replyTo=Actor[akka://opendaylight-cluster-data/user/$m#-2004618990
], modifications=0, protocol=SIMPLE}
&lt;p&gt; callback org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyT&lt;br/&gt;
ransaction$$Lambda$144/2040119293@41f2f4ef&lt;br/&gt;
2017-07-28 14:10:35,349 | DEBUG | pool-30-thread-1 | TransmitQueue                    | 197 - org.opendaylight.control&lt;br/&gt;
ler.cds-access-client - 1.1.2.SNAPSHOT | Queue org.opendaylight.controller.cluster.access.client.TransmitQueue$Transmi&lt;br/&gt;
tting@130ebfcf transmitting entry ConnectionEntry{request=ModifyTransactionRequest&lt;/p&gt;
{target=member-2-datastore-Shard-id-
ints!-fe-0-chn-5-txn-613-0, sequence=1, replyTo=Actor[akka://opendaylight-cluster-data/user/$m#-2004618990], modificat
ions=0, protocol=SIMPLE}
&lt;p&gt;, enqueuedTicks=503857373572}&lt;br/&gt;
2017-07-28 14:10:35,349 | TRACE | pool-30-thread-1 | ShardedDOMDataTreeProducer       | 147 - org.opendaylight.mdsal.d&lt;br/&gt;
om-broker - 2.2.2.SNAPSHOT | org.opendaylight.mdsal.dom.broker.ShardedDOMDataTreeProducer@4884f6f8 submitted in transa&lt;br/&gt;
ctionSubmitted, returning&lt;br/&gt;
2017-07-28 14:10:35,349 | DEBUG | pool-30-thread-1 | AbstractTransactionHandler       | 257 - org.opendaylight.control&lt;br/&gt;
ler.samples.clustering-it-provider - 1.5.2.SNAPSHOT | org.opendaylight.controller.clustering.it.provider.impl.ProduceT&lt;br/&gt;
ransactionsHandler@1c238bc8 New future #5504 allocated&lt;/p&gt;



&lt;p&gt;Compare to the final (possible) deadlock:&lt;/p&gt;

&lt;p&gt;2017-07-28 14:12:35,377 | DEBUG | pool-30-thread-1 | ProduceTransactionsHandler       | 257 - org.opendaylight.control&lt;br/&gt;
ler.samples.clustering-it-provider - 1.5.2.SNAPSHOT | Inserting item: 216844&lt;br/&gt;
2017-07-28 14:12:35,377 | TRACE | pool-30-thread-1 | ShardedDOMDataTreeProducer       | 147 - org.opendaylight.mdsal.d&lt;br/&gt;
om-broker - 2.2.2.SNAPSHOT | org.opendaylight.mdsal.dom.broker.ShardedDOMDataTreeProducer@4884f6f8 synchronizing for n&lt;br/&gt;
ull lastTx in transactionSubmitted&lt;br/&gt;
2017-07-28 14:12:35,377 | TRACE | pool-30-thread-1 | ShardedDOMDataTreeProducer       | 147 - org.opendaylight.mdsal.d&lt;br/&gt;
om-broker - 2.2.2.SNAPSHOT | org.opendaylight.mdsal.dom.broker.ShardedDOMDataTreeProducer@4884f6f8 synchronized in tra&lt;br/&gt;
nsactionSubmitted, submitting&lt;br/&gt;
2017-07-28 14:12:35,377 | DEBUG | pool-30-thread-1 | ardedDOMDataTreeWriteTransaction | 147 - org.opendaylight.mdsal.d&lt;br/&gt;
om-broker - 2.2.2.SNAPSHOT | Readying tx SHARDED-DOM-614&lt;br/&gt;
2017-07-28 14:12:35,377 | DEBUG | lt-dispatcher-26 | AbstractProxyTransaction         | 199 - org.opendaylight.control&lt;br/&gt;
ler.sal-distributed-datastore - 1.5.2.SNAPSHOT | Transaction RemoteProxyTransaction&lt;/p&gt;
{identifier=member-2-datastore-Shar
d-id-ints!-fe-0-chn-5-txn-613-0, state=FLUSHED} directCommit completed&lt;br/&gt;
2017-07-28 14:12:35,377 | DEBUG | lt-dispatcher-26 | AbstractProxyTransaction         | 199 - org.opendaylight.control&lt;br/&gt;
ler.sal-distributed-datastore - 1.5.2.SNAPSHOT | RemoteProxyTransaction{identifier=member-2-datastore-Shard-id-ints!-fe-0-chn-5-txn-613-0, state=FLUSHED}
&lt;p&gt;: initiating purge&lt;br/&gt;
2017-07-28 14:12:35,377 | DEBUG | pool-30-thread-1 | AbstractProxyTransaction         | 199 - org.opendaylight.control&lt;br/&gt;
ler.sal-distributed-datastore - 1.5.2.SNAPSHOT | Transaction proxy RemoteProxyTransaction&lt;/p&gt;
{identifier=member-2-datastor
e-Shard-id-ints!-fe-0-chn-5-txn-614-0, state=OPEN}
&lt;p&gt; sending request ModifyTransactionRequest&lt;/p&gt;
{target=member-2-datastore-
Shard-id-ints!-fe-0-chn-5-txn-614-0, sequence=120, replyTo=Actor[akka://opendaylight-cluster-data/user/$m#-2004618990]
, modifications=37, protocol=READY}
&lt;p&gt; callback org.opendaylight.controller.cluster.databroker.actors.dds.RemoteProxyTran&lt;br/&gt;
saction$$Lambda$141/838048079@7768cd2e&lt;br/&gt;
2017-07-28 14:12:35,377 | DEBUG | lt-dispatcher-26 | AbstractProxyTransaction         | 199 - org.opendaylight.control&lt;br/&gt;
ler.sal-distributed-datastore - 1.5.2.SNAPSHOT | Transaction RemoteProxyTransaction&lt;/p&gt;
{identifier=member-2-datastore-Shar
d-id-ints!-fe-0-chn-5-txn-613-0, state=DONE} allocated sequence 2&lt;br/&gt;
2017-07-28 14:12:35,377 | DEBUG | pool-31-thread-2 | AbstractTransactionHandler       | 257 - org.opendaylight.control&lt;br/&gt;
ler.samples.clustering-it-provider - 1.5.2.SNAPSHOT | org.opendaylight.controller.clustering.it.provider.impl.ProduceT&lt;br/&gt;
ransactionsHandler@1c238bc8 Future #5504 failed in state RUNNING cause org.opendaylight.controller.cluster.access.clie&lt;br/&gt;
nt.RequestTimeoutException: Timed out after 120.023840912seconds&lt;br/&gt;
2017-07-28 14:12:35,377 | DEBUG | lt-dispatcher-26 | AbstractProxyTransaction         | 199 - org.opendaylight.control&lt;br/&gt;
ler.sal-distributed-datastore - 1.5.2.SNAPSHOT | Transaction proxy RemoteProxyTransaction{identifier=member-2-datastore-Shard-id-ints!-fe-0-chn-5-txn-613-0, state=DONE}
&lt;p&gt; enqueing request TransactionPurgeRequest&lt;/p&gt;
{target=member-2-datastore-
Shard-id-ints!-fe-0-chn-5-txn-613-0, sequence=2, replyTo=Actor[akka://opendaylight-cluster-data/user/$m#-2004618990]}
&lt;p&gt; &lt;br/&gt;
callback org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction$$Lambda$153/1233784439@539&lt;br/&gt;
a6d2f&lt;br/&gt;
2017-07-28 14:12:35,377 | DEBUG | pool-30-thread-1 | AbstractClientConnection         | 197 - org.opendaylight.control&lt;br/&gt;
ler.cds-access-client - 1.1.2.SNAPSHOT | member-2-frontend-datastore-Shard-id-ints!: connection ConnectedClientConnect&lt;br/&gt;
ion{client=ClientIdentifier&lt;/p&gt;
{frontend=member-2-frontend-datastore-Shard-id-ints!, generation=0}
&lt;p&gt;, cookie=0, backend=Shar&lt;br/&gt;
dBackendInfo&lt;/p&gt;
{actor=Actor[akka.tcp://opendaylight-cluster-data@10.29.15.124:2550/user/shardmanager-config/member-1-shar
d-id-ints!-config#-145628703], sessionId=0, version=BORON, maxMessages=1000, cookie=0, shard=id-ints!, dataTree=absent
}}
&lt;p&gt; has a successor, not scheduling timer&lt;br/&gt;
2017-07-28 14:12:35,378 | TRACE | pool-31-thread-2 | AbstractTransactionHandler       | 257 - org.opendaylight.control&lt;br/&gt;
ler.samples.clustering-it-provider - 1.5.2.SNAPSHOT | Removing the future.&lt;br/&gt;
2017-07-28 14:12:35,378 | TRACE | pool-31-thread-2 | AbstractTransactionHandler       | 257 - org.opendaylight.control&lt;br/&gt;
ler.samples.clustering-it-provider - 1.5.2.SNAPSHOT | The future has been removed.&lt;br/&gt;
2017-07-28 14:12:35,378 | TRACE | pool-31-thread-2 | AbstractTransactionHandler       | 257 - org.opendaylight.control&lt;br/&gt;
ler.samples.clustering-it-provider - 1.5.2.SNAPSHOT | Local state RUNNING&lt;br/&gt;
2017-07-28 14:12:35,378 | TRACE | pool-31-thread-2 | AbstractTransactionHandler       | 257 - org.opendaylight.control&lt;br/&gt;
ler.samples.clustering-it-provider - 1.5.2.SNAPSHOT | Setting state to FAILED.&lt;br/&gt;
2017-07-28 14:12:35,378 | TRACE | pool-31-thread-2 | AbstractTransactionHandler       | 257 - org.opendaylight.controller.samples.clustering-it-provider - 1.5.2.SNAPSHOT | Cancelling writingFuture.&lt;br/&gt;
2017-07-28 14:12:35,378 | TRACE | pool-31-thread-2 | AbstractTransactionHandler       | 257 - org.opendaylight.controller.samples.clustering-it-provider - 1.5.2.SNAPSHOT | Calling runFailed&lt;br/&gt;
2017-07-28 14:12:35,378 | TRACE | pool-31-thread-2 | ProduceTransactionsHandler       | 257 - org.opendaylight.controller.samples.clustering-it-provider - 1.5.2.SNAPSHOT | org.opendaylight.controller.clustering.it.provider.impl.ProduceTransactionsHandler@1c238bc8 Entering runFailed.&lt;br/&gt;
2017-07-28 14:12:35,378 | TRACE | pool-31-thread-2 | ShardedDOMDataTreeProducer       | 147 - org.opendaylight.mdsal.dom-broker - 2.2.2.SNAPSHOT | org.opendaylight.mdsal.dom.broker.ShardedDOMDataTreeProducer@4884f6f8 entering close&lt;br/&gt;
2017-07-28 14:12:35,378 | TRACE | pool-31-thread-2 | ShardedDOMDataTreeProducer       | 147 - org.opendaylight.mdsal.dom-broker - 2.2.2.SNAPSHOT | org.opendaylight.mdsal.dom.broker.ShardedDOMDataTreeProducer@4884f6f8 close doing compare and set&lt;br/&gt;
2017-07-28 14:12:35,378 | TRACE | pool-31-thread-2 | ShardedDOMDataTreeProducer       | 147 - org.opendaylight.mdsal.dom-broker - 2.2.2.SNAPSHOT | org.opendaylight.mdsal.dom.broker.ShardedDOMDataTreeProducer@4884f6f8 close set, synchronizing&lt;/p&gt;


&lt;p&gt;Here, the threads are:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;thread pool-30-thread-1 : writingExecutor thread&lt;/li&gt;
	&lt;li&gt;pool-31-thread-2 : completingExecutor thread&lt;/li&gt;
	&lt;li&gt;lt-dispatcher-26 : the actor thread which detects transaction timeouts (I think)&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The common part is up to &quot;RemoteProxyTransaction {} sending request&quot;.&lt;br/&gt;
Then (instead of &quot;TransmitQueue {} transmitting entry&quot;), pool-30-thread-1 only did &quot;has a successor, not scheduling timer&quot;,&lt;br/&gt;
which I believe is called from AbstractClientConnection#commonEnqueue when queue.isEmpty(), and after that, nothing.&lt;/p&gt;

&lt;p&gt;ShardedDOMDataTreeProducer@4884f6f8 remained in synchronized state, which explains why the close method got stuck on synchronize as well.&lt;/p&gt;

&lt;p&gt;Any ideas where to put even more logs?&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;41&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-ci-only-carbon/81/odl2_karaf.log.gz&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-ci-only-carbon/81/odl2_karaf.log.gz&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52527" author="rovarga" created="Sat, 29 Jul 2017 10:15:23 +0000"  >&lt;p&gt;This looks like it&apos;s pointing towards an AB/BA deadlock. It would be best to obtain a thread dump when this thing is stuck &amp;#8211; that way we know which threads are holding locks and what is being waited for...&lt;/p&gt;</comment>
                            <comment id="52528" author="vrpolak" created="Mon, 31 Jul 2017 12:29:25 +0000"  >&lt;p&gt;&amp;gt; obtain a thread dump&lt;/p&gt;

&lt;p&gt;Got it: &lt;span class=&quot;error&quot;&gt;&amp;#91;42&amp;#93;&lt;/span&gt;. Karaf log for context: &lt;span class=&quot;error&quot;&gt;&amp;#91;43&amp;#93;&lt;/span&gt;.&lt;/p&gt;

&lt;p&gt;Copying the deadlock report, once again pool-30-thread-1 is the writingExecutor thread and pool-31-thread-2 is the completingExecutor thread:&lt;/p&gt;

&lt;p&gt;Found one Java-level deadlock:&lt;br/&gt;
=============================&lt;br/&gt;
&quot;pool-31-thread-2&quot;:&lt;br/&gt;
  waiting to lock monitor 0x00007f93880b4d58 (object 0x00000000856472b8, a org.opendaylight.mdsal.dom.broker.ShardedDOMDataTreeProducer),&lt;br/&gt;
  which is held by &quot;pool-30-thread-1&quot;&lt;br/&gt;
&quot;pool-30-thread-1&quot;:&lt;br/&gt;
  waiting for ownable synchronizer 0x00000000f5c7f750, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),&lt;br/&gt;
  which is held by &quot;opendaylight-cluster-data-akka.actor.default-dispatcher-3&quot;&lt;br/&gt;
&quot;opendaylight-cluster-data-akka.actor.default-dispatcher-3&quot;:&lt;br/&gt;
  waiting for ownable synchronizer 0x00000000854dedf0, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),&lt;br/&gt;
  which is held by &quot;pool-30-thread-1&quot;&lt;/p&gt;

&lt;p&gt;Java stack information for the threads listed above:&lt;br/&gt;
===================================================&lt;br/&gt;
&quot;pool-31-thread-2&quot;:&lt;br/&gt;
	at org.opendaylight.mdsal.dom.broker.ShardedDOMDataTreeProducer.close(ShardedDOMDataTreeProducer.java:209)&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;waiting to lock &amp;lt;0x00000000856472b8&amp;gt; (a org.opendaylight.mdsal.dom.broker.ShardedDOMDataTreeProducer)&lt;br/&gt;
	at org.opendaylight.controller.cluster.sharding.DistributedShardedDOMDataTree$ProxyProducer.close(DistributedShardedDOMDataTree.java:679)&lt;br/&gt;
	at org.opendaylight.controller.clustering.it.provider.impl.ProduceTransactionsHandler.closeProducer(ProduceTransactionsHandler.java:98)&lt;br/&gt;
	at org.opendaylight.controller.clustering.it.provider.impl.ProduceTransactionsHandler.runFailed(ProduceTransactionsHandler.java:135)&lt;br/&gt;
	at org.opendaylight.controller.clustering.it.provider.impl.AbstractTransactionHandler.txFailure(AbstractTransactionHandler.java:201)&lt;br/&gt;
	at org.opendaylight.controller.clustering.it.provider.impl.AbstractTransactionHandler$1.onFailure(AbstractTransactionHandler.java:136)&lt;br/&gt;
	at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)&lt;br/&gt;
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)&lt;br/&gt;
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)&lt;br/&gt;
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)&lt;br/&gt;
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)&lt;br/&gt;
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)&lt;br/&gt;
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)&lt;br/&gt;
	at java.lang.Thread.run(Thread.java:748)&lt;br/&gt;
&quot;pool-30-thread-1&quot;:&lt;br/&gt;
	at sun.misc.Unsafe.park(Native Method)&lt;/li&gt;
	&lt;li&gt;parking to wait for  &amp;lt;0x00000000f5c7f750&amp;gt; (a java.util.concurrent.locks.ReentrantLock$NonfairSync)&lt;br/&gt;
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)&lt;br/&gt;
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)&lt;br/&gt;
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)&lt;br/&gt;
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)&lt;br/&gt;
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)&lt;br/&gt;
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)&lt;br/&gt;
	at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.enqueueOrForward(AbstractClientConnection.java:175)&lt;br/&gt;
	at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.sendEntry(AbstractClientConnection.java:257)&lt;br/&gt;
	at org.opendaylight.controller.cluster.access.client.SimpleReconnectForwarder.forwardEntry(SimpleReconnectForwarder.java:18)&lt;br/&gt;
	at org.opendaylight.controller.cluster.access.client.TransmitQueue.enqueueOrForward(TransmitQueue.java:220)&lt;br/&gt;
	at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.enqueueOrForward(AbstractClientConnection.java:179)&lt;br/&gt;
	at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.sendEntry(AbstractClientConnection.java:257)&lt;br/&gt;
	at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.sendRequest(AbstractClientConnection.java:153)&lt;br/&gt;
	at org.opendaylight.controller.cluster.databroker.actors.dds.ProxyHistory.sendRequest(ProxyHistory.java:449)&lt;br/&gt;
	at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction.sendRequest(AbstractProxyTransaction.java:327)&lt;br/&gt;
	at org.opendaylight.controller.cluster.databroker.actors.dds.RemoteProxyTransaction.sendModification(RemoteProxyTransaction.java:169)&lt;br/&gt;
	at org.opendaylight.controller.cluster.databroker.actors.dds.RemoteProxyTransaction.flushBuilder(RemoteProxyTransaction.java:162)&lt;br/&gt;
	at org.opendaylight.controller.cluster.databroker.actors.dds.RemoteProxyTransaction.sealAndSend(RemoteProxyTransaction.java:274)&lt;br/&gt;
	at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction.seal(AbstractProxyTransaction.java:339)&lt;br/&gt;
	at org.opendaylight.controller.cluster.databroker.actors.dds.ClientTransaction$$Lambda$133/1610149084.accept(Unknown Source)&lt;br/&gt;
	at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)&lt;br/&gt;
	at org.opendaylight.controller.cluster.databroker.actors.dds.ClientTransaction.ready(ClientTransaction.java:103)&lt;br/&gt;
	at org.opendaylight.controller.cluster.sharding.DistributedShardModificationContext.ready(DistributedShardModificationContext.java:49)&lt;br/&gt;
	at org.opendaylight.controller.cluster.sharding.DistributedShardModification.seal(DistributedShardModification.java:63)&lt;br/&gt;
	at org.opendaylight.controller.cluster.sharding.ShardProxyTransaction.ready(ShardProxyTransaction.java:104)&lt;br/&gt;
	at org.opendaylight.mdsal.dom.broker.ShardedDOMDataTreeWriteTransaction.doSubmit(ShardedDOMDataTreeWriteTransaction.java:143)&lt;br/&gt;
	at org.opendaylight.mdsal.dom.broker.ShardedDOMDataTreeProducer.submitTransaction(ShardedDOMDataTreeProducer.java:156)&lt;br/&gt;
	at org.opendaylight.mdsal.dom.broker.ShardedDOMDataTreeProducer.transactionSubmitted(ShardedDOMDataTreeProducer.java:239)&lt;/li&gt;
	&lt;li&gt;locked &amp;lt;0x00000000856472b8&amp;gt; (a org.opendaylight.mdsal.dom.broker.ShardedDOMDataTreeProducer)&lt;br/&gt;
	at org.opendaylight.mdsal.dom.broker.ShardedDOMDataTreeWriteTransaction.submit(ShardedDOMDataTreeWriteTransaction.java:128)&lt;/li&gt;
	&lt;li&gt;locked &amp;lt;0x00000000f5a4fba8&amp;gt; (a org.opendaylight.mdsal.dom.broker.ShardedDOMDataTreeWriteTransaction)&lt;br/&gt;
	at org.opendaylight.controller.clustering.it.provider.impl.ProduceTransactionsHandler.execWrite(ProduceTransactionsHandler.java:129)&lt;br/&gt;
	at org.opendaylight.controller.clustering.it.provider.impl.AbstractTransactionHandler.runningExecute(AbstractTransactionHandler.java:123)&lt;br/&gt;
	at org.opendaylight.controller.clustering.it.provider.impl.AbstractTransactionHandler.execute(AbstractTransactionHandler.java:105)&lt;br/&gt;
	at org.opendaylight.controller.clustering.it.provider.impl.AbstractTransactionHandler$$Lambda$260/1225246231.run(Unknown Source)&lt;br/&gt;
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)&lt;br/&gt;
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)&lt;br/&gt;
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)&lt;br/&gt;
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)&lt;br/&gt;
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)&lt;br/&gt;
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)&lt;br/&gt;
	at java.lang.Thread.run(Thread.java:748)&lt;br/&gt;
&quot;opendaylight-cluster-data-akka.actor.default-dispatcher-3&quot;:&lt;br/&gt;
	at sun.misc.Unsafe.park(Native Method)&lt;/li&gt;
	&lt;li&gt;parking to wait for  &amp;lt;0x00000000854dedf0&amp;gt; (a java.util.concurrent.locks.ReentrantLock$NonfairSync)&lt;br/&gt;
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)&lt;br/&gt;
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)&lt;br/&gt;
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)&lt;br/&gt;
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)&lt;br/&gt;
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)&lt;br/&gt;
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)&lt;br/&gt;
	at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.enqueueEntry(AbstractClientConnection.java:190)&lt;br/&gt;
	at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.enqueueRequest(AbstractClientConnection.java:170)&lt;br/&gt;
	at org.opendaylight.controller.cluster.databroker.actors.dds.ProxyHistory.enqueueRequest(ProxyHistory.java:445)&lt;br/&gt;
	at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction.enqueueRequest(AbstractProxyTransaction.java:322)&lt;br/&gt;
	at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction.enqueuePurge(AbstractProxyTransaction.java:642)&lt;br/&gt;
	at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction.enqueuePurge(AbstractProxyTransaction.java:624)&lt;br/&gt;
	at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction.enqueuePurge(AbstractProxyTransaction.java:619)&lt;br/&gt;
	at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction.lambda$directCommit$4(AbstractProxyTransaction.java:516)&lt;br/&gt;
	at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction$$Lambda$138/500577411.accept(Unknown Source)&lt;br/&gt;
	at org.opendaylight.controller.cluster.access.client.ConnectionEntry.complete(ConnectionEntry.java:50)&lt;br/&gt;
	at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.lockedCheckTimeout(AbstractClientConnection.java:415)&lt;br/&gt;
	at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.runTimer(AbstractClientConnection.java:346)&lt;br/&gt;
	at org.opendaylight.controller.cluster.access.client.AbstractClientConnection$$Lambda$132/2043163015.execute(Unknown Source)&lt;br/&gt;
	at org.opendaylight.controller.cluster.access.client.ClientActorBehavior.onReceiveCommand(ClientActorBehavior.java:123)&lt;br/&gt;
	at org.opendaylight.controller.cluster.access.client.ClientActorBehavior.onReceiveCommand(ClientActorBehavior.java:46)&lt;br/&gt;
	at org.opendaylight.controller.cluster.access.client.AbstractClientActor.onReceiveCommand(AbstractClientActor.java:60)&lt;br/&gt;
	at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:170)&lt;br/&gt;
	at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)&lt;br/&gt;
	at akka.actor.Actor$class.aroundReceive(Actor.scala:502)&lt;br/&gt;
	at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:168)&lt;br/&gt;
	at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:727)&lt;br/&gt;
	at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:183)&lt;br/&gt;
	at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:168)&lt;br/&gt;
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)&lt;br/&gt;
	at akka.actor.ActorCell.invoke(ActorCell.scala:495)&lt;br/&gt;
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)&lt;br/&gt;
	at akka.dispatch.Mailbox.run(Mailbox.scala:224)&lt;br/&gt;
	at akka.dispatch.Mailbox.exec(Mailbox.scala:234)&lt;br/&gt;
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)&lt;br/&gt;
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)&lt;br/&gt;
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)&lt;br/&gt;
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;42&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-ci-only-carbon/2/karaf_1_threads_after.log.gz&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-ci-only-carbon/2/karaf_1_threads_after.log.gz&lt;/a&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;43&amp;#93;&lt;/span&gt; &lt;a href=&quot;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-ci-only-carbon/2/odl1_karaf.log.gz&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-ci-only-carbon/2/odl1_karaf.log.gz&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="52529" author="rovarga" created="Mon, 31 Jul 2017 13:44:12 +0000"  >&lt;p&gt;This is a classic AB/BA deadlock scenario involving &quot;pool-30-thread-1&quot; and &quot;opendaylight-cluster-data-akka.actor.default-dispatcher-3&quot;, each of which is dealing with a chain of connections.&lt;/p&gt;

&lt;p&gt;The former is submitting a request, hence it is taking the locks in order of succession (old to new), which is correct.&lt;/p&gt;

&lt;p&gt;The latter is running a timeout check on the last connection in the succession chain and invoking a timeout callback, which is rooted at the head of the chain. That callback ends up trying to lock take the locks in the same order, but because pool-30-thread-1 is waiting for the tail of the chain, which is already held, we arrive at a deadlock.&lt;/p&gt;</comment>
                            <comment id="52530" author="rovarga" created="Mon, 31 Jul 2017 14:03:52 +0000"  >&lt;p&gt;&lt;a href=&quot;https://git.opendaylight.org/gerrit/60934&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://git.opendaylight.org/gerrit/60934&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                            <customfield id="customfield_11400" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10208" key="com.atlassian.jira.plugin.system.customfieldtypes:textfield">
                        <customfieldname>External issue ID</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8898</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10201" key="com.atlassian.jira.plugin.system.customfieldtypes:url">
                        <customfieldname>External issue URL</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[https://bugs.opendaylight.org/show_bug.cgi?id=8898]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10206" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Issue Type</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10300"><![CDATA[Bug]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10204" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>ODL SR Target Milestone</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10337"><![CDATA[Carbon-SR2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10000" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|i02sgv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>