<!-- 
RSS generated by JIRA (8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d) at Wed Feb 07 19:56:39 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>OpenDaylight JIRA</title>
    <link>https://jira.opendaylight.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>8.20.10</version>
        <build-number>820010</build-number>
        <build-date>22-06-2022</build-date>
    </build-info>


<item>
            <title>[CONTROLLER-1866] circuit breaker timed out; datastore shutdown</title>
                <link>https://jira.opendaylight.org/browse/CONTROLLER-1866</link>
                <project id="10113" key="CONTROLLER">controller</project>
                    <description>&lt;p&gt;There is a failure showing up in netvirt 1node CSIT where the output of the karaf&lt;br/&gt;
cli &quot;showSvcStatus&quot; has the DATASTORE in ERROR state:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
Timestamp: Tue Oct 09 21:02:42 UTC 2018
Node IP Address: 10.30.170.157
&lt;span class=&quot;code-object&quot;&gt;System&lt;/span&gt; is operational: &lt;span class=&quot;code-keyword&quot;&gt;false&lt;/span&gt;
&lt;span class=&quot;code-object&quot;&gt;System&lt;/span&gt; ready state: ACTIVE
  OPENFLOW : OPERATIONAL
  IFM : OPERATIONAL
  ITM : OPERATIONAL
  ELAN : OPERATIONAL
  OVSDB : OPERATIONAL
  DATASTORE : ERROR java.lang.reflect.UndeclaredThrowableException
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Looking at the &lt;a href=&quot;https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-0cmb-1ctl-2cmp-openstack-queens-upstream-stateful-snat-conntrack-oxygen/86/odl_1/odl1_karaf.log.gz&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;karaf.log &lt;/a&gt; it seems the reason for this is that we hit&lt;br/&gt;
a circuit breaker timed out issue and some cluster/akka logic is shutting down&lt;br/&gt;
the datastore.&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
2018-10-09T20:58:22,469 | ERROR | opendaylight-cluster-data-akka.actor.&lt;span class=&quot;code-keyword&quot;&gt;default&lt;/span&gt;-dispatcher-39 | Shard                            | 228 - org.opendaylight.controller.sal-clustering-commons - 1.7.4 | Failed to persist event type [org.opendaylight.controller.cluster.raft.persisted.SimpleReplicatedLogEntry] with sequence number [78318] &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; persistenceId [member-1-shard-&lt;span class=&quot;code-keyword&quot;&gt;default&lt;/span&gt;-config].
akka.pattern.CircuitBreaker$$anon$1: Circuit Breaker Timed out.
2018-10-09T20:58:22,515 | INFO  | opendaylight-cluster-data-shard-dispatcher-215 | Shard                            | 228 - org.opendaylight.controller.sal-clustering-commons - 1.7.4 | Stopping Shard member-1-shard-&lt;span class=&quot;code-keyword&quot;&gt;default&lt;/span&gt;-config
2018-10-09T20:58:22,517 | WARN  | opendaylight-cluster-data-akka.actor.&lt;span class=&quot;code-keyword&quot;&gt;default&lt;/span&gt;-dispatcher-70 | LocalThreePhaseCommitCohort      | 235 - org.opendaylight.controller.sal-distributed-datastore - 1.7.4 | Failed to prepare transaction member-1-datastore-config-fe-0-txn-65215-0 on backend
java.lang.RuntimeException: Transaction aborted due to shutdown.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is not neccessarily a heavy job so I am not suspecting that this job is not able&lt;br/&gt;
to keep up with writing to disk, which I think is one reason this might happen.&lt;br/&gt;
&#160;&lt;/p&gt;</description>
                <environment></environment>
        <key id="30902">CONTROLLER-1866</key>
            <summary>circuit breaker timed out; datastore shutdown</summary>
                <type id="10104" iconUrl="https://jira.opendaylight.org/secure/viewavatar?size=xsmall&amp;avatarId=10303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.opendaylight.org/images/icons/priorities/major.svg">Medium</priority>
                        <status id="5" iconUrl="https://jira.opendaylight.org/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="green"/>
                                    <resolution id="10003">Cannot Reproduce</resolution>
                                        <assignee username="jluhrsen">Jamo Luhrsen</assignee>
                                    <reporter username="jluhrsen">Jamo Luhrsen</reporter>
                        <labels>
                            <label>csit</label>
                            <label>csit:failures</label>
                    </labels>
                <created>Wed, 17 Oct 2018 05:32:21 +0000</created>
                <updated>Sat, 20 Oct 2018 04:46:22 +0000</updated>
                            <resolved>Sat, 20 Oct 2018 04:46:22 +0000</resolved>
                                                    <fixVersion>Neon</fixVersion>
                    <fixVersion>Fluorine SR1</fixVersion>
                    <fixVersion>Oxygen SR4</fixVersion>
                                    <component>clustering</component>
                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                                                                <comments>
                            <comment id="65371" author="tpantelis" created="Wed, 17 Oct 2018 13:09:45 +0000"  >&lt;p&gt;This failure usually means the disk is busy/overloaded or slow. The CSIT env uses shared disks I assume so perhaps there were other jobs or whatever running using the disk heavily. It could also be due in part to a really high rate of transactions. &lt;/p&gt;

&lt;p&gt;The sequence number was 78318 which seems pretty high, especially for the default config shard. You say it&apos;s not a heavy job but that seems a bit heavy although I don&apos;t know how long a period the job runs for or what it does. If this failure is happening repeatedly now all of a sudden then I would suggest looking into the app code or the test - perhaps some recent change is causing an unusually high transaction rate?&lt;/p&gt;

&lt;p&gt;Either way the only potential remedy to alleviate the failure from the CDS or akka or levelDB side is to increase the timeout (I think the default timeout is 5 sec).  In controller/configuration/initial/akka.conf,&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;akka {
  persistence {
    journal-plugin-fallback {
      circuit-breaker {
        call-timeout = 300s
      }
    }
    snapshot-store-plugin-fallback {
      circuit-breaker {
        max-failures = 10
        call-timeout = 300s
      }
    }
  }
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="65381" author="jluhrsen" created="Sat, 20 Oct 2018 04:45:43 +0000"  >&lt;p&gt;ok, I searched through ~300 job results and this was the only instance of this failure. Let&apos;s go with the theory that this&lt;br/&gt;
was caused by some external processes in this shared environment consuming/overloading the disk. I&apos;ll close it out.&lt;/p&gt;</comment>
                            <comment id="65382" author="jluhrsen" created="Sat, 20 Oct 2018 04:46:22 +0000"  >&lt;p&gt;assuming to be caused by an overloaded environment&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                            <customfield id="customfield_11400" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10000" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0|i03jpz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>