[CONTROLLER-1761] Uncaught error from thread ... shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled Created: 28/Aug/17  Updated: 19/Oct/17  Resolved: 04/Sep/17

Status: Resolved
Project: controller
Component/s: mdsal
Affects Version/s: Carbon
Fix Version/s: None

Type: Bug
Reporter: Michael Vorburger Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Duplicate
duplicates CONTROLLER-1755 RaftActor lastApplied index moves bac... Resolved
External issue ID: 9061

 Description   

On the same environment as the one described in CONTROLLER-1756 but the problem described here is been seen on that environment but without any OOM I'm told - thus filing this as a separate new issue:

We're seeing ODL "commit suicide" (shut itself down, by System.exit) with the last line in the log being:

2017-08-24 10:03:04,487 | ERROR | -dispatcher-5689 | ActorSystemImpl | 186 - com.typesafe.akka.slf4j - 2.4.18 | Uncaught error from thread [opendaylight-cluster-data-shard-dispatcher-88] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled

Before that there are numerous errors from (...)rd-dispatcher-88 like this:

2017-08-24 10:03:03,446 | ERROR | rd-dispatcher-88 | Shard | 204 - org.opendaylight.controller.sal-clustering-commons - 1.5.2.Carbon | member-1-shard-default-operational, An exception occurred while committing transaction member-2-datastore-operational-fe-1-txn-10344-0
java.lang.IllegalStateException: Store tree org.opendaylight.yangtools.yang.data.api.schema.tree.spi.LazyContainerNode@3cde34f3 and candidate base org.opendaylight.yangtools.yang.data.api.schema.tree.spi.LazyContainerNode@129ff94a differ.

Full log is attached.



 Comments   
Comment by Michael Vorburger [ 28/Aug/17 ]

> Full log is attached.

2.8M was too big, so its on https://drive.google.com/open?id=0B7gTXYDlI5sLR3ZhcmNZZkw1OEU

Comment by Michael Vorburger [ 28/Aug/17 ]

> Full log is attached.

There are (a lot, hard to be sure that's relevant and what not..) other errors in that log as well, including this kind (from different components but both having the same root cause AFAIK) - are these related to the problem here:

2017-08-24 10:02:42,772 | ERROR | Pool-1-worker-81 | DataStoreJobCoordinator | 319 - org.opendaylight.genius.mdsalutil-api - 0.2.2.Carbon | Exception when executing jobEntry: JobEntry

{key='VPNINTERFACE-7b4c0454-6452-4e5f-9856-aae03a9b223e', mainWorker=org.opendaylight.netvirt.vpnmanager.InterfaceStateChangeListener$$Lambda$919/186376061@7bb23fe4, rollbackWorker=null, retryCount=0, futures=null}

java.lang.RuntimeException: TransactionCommitFailedException

{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-operational/member-1-shard-default-operational#856921107)]] after [30000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.datastore.messages.ReadyLocalTransaction".]]}

2017-08-24 10:02:42,832 | ERROR | CommitFutures-11 | TransactionChainManager | 307 - org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Transaction commit failed.
TransactionCommitFailedException

{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-operational/member-1-shard-default-operational#856921107)]] after [30000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.datastore.messages.ReadyLocalTransaction".]]}
Comment by Robert Varga [ 28/Aug/17 ]

Have you checked with latest Carbon?

Comment by Michael Vorburger [ 28/Aug/17 ]

> Have you checked with latest Carbon?

No, this was seen with the Carbon build mentioned in CONTROLLER-1756 (opendaylight-6.2.0-0.1.20170817rel1931.el7.noarch.rpm) - but can we deduce from your Q that you have hopes that recent fixes e.g. CONTROLLER-1755 or CONTROLLER-1746 may solve this? If so, just to learn, why?

But we'll redo this scale test ASAP - now waiting for OPNFLWPLUG-933 c/62358 merge...

Comment by Robert Varga [ 28/Aug/17 ]

Well, for one, the ISE mentioned in description is a CONTROLLER-1755 duplicate. Fixing that may mask whoever is not catching their exception again...

Comment by Robert Varga [ 04/Sep/17 ]

As far as I can tell, this is fixed with 9028. Please reopen with updated codebase/logs if it persists.

Generated at Wed Feb 07 19:56:23 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.