Details
-
Bug
-
Status: Resolved
-
Resolution: Duplicate
-
Post-Helium
-
None
-
None
-
Operating System: All
Platform: All
-
5391
Description
Found by clustering test run: https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-3node-clustering-only-beryllium/53/
The relevant report is from odl2_karaf.log (see the test run artifacts or the attachment which contains a copy of the logs):
2016-02-18 22:11:04,009 | WARN | qtp862704672-67 | BrokerFacade | 211 - org.opendaylight.netconf.sal-rest-connector - 1.3.0.SNAPSHOT | Exception by reading OPERATIONAL via Restconf: /(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)network-topology/topology/topology[
{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)topology-id=topology-netconf}]/node/node[
{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=netconf-test-device}] java.util.concurrent.ExecutionException: ReadFailedException{message=Error executeRead ReadData for path /(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)network-topology/topology/topology[
{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)topology-id=topology-netconf}]/node/node[
{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=netconf-test-device}], errorList=[RpcError [message=Error executeRead ReadData for path /(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)network-topology/topology/topology[
{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)topology-id=topology-netconf}]/node/node[
{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=netconf-test-device}], severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-2-shard-topology-operational currently has no leader. Try again later.]]} at org.opendaylight.yangtools.util.concurrent.MappingCheckedFuture.wrapInExecutionException(MappingCheckedFuture.java:63)
and in odl1_karaf.log (the timestamp is quite weird, according to it the error below happened 2 minutes BEFORE the error above):
2016-02-18 22:09:12,905 | WARN | lt-dispatcher-50 | ConcurrentDOMDataBroker | 143 - org.opendaylight.controller.sal-distributed-datastore - 1.3.0.SNAPSHOT | Tx: DOM-CHAIN-0-0 Error during phase CAN_COMMIT, starting Abort akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@10.30.11.66:2550/), Path(/user/shardmanager-config/member-3-shard-topology-config)]] after [5000 ms] at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)[128:com.typesafe.akka.actor:2.3.14] at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)[128:com.typesafe.akka.actor:2.3.14] at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:599)[125:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c] at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)[125:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c] at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:597)[125:org.scala-lang.scala-library:2.11.7.v20150622-112736-1fbce4612c] at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)[128:com.typesafe.akka.actor:2.3.14] at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)[128:com.typesafe.akka.actor:2.3.14] at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)[128:com.typesafe.akka.actor:2.3.14] at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)[128:com.typesafe.akka.actor:2.3.14] at java.lang.Thread.run(Thread.java:745)[:1.7.0_85] 2016-02-18 22:09:12,909 | ERROR | CommitFutures-1 | TopologyNodeWriter | 240 - org.opendaylight.netconf.topology - 1.0.0.SNAPSHOT | org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter@63d9f743: TransactionChain(DOM-CHAIN-0-0) TransactionCommitFailedException
{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@10.30.11.66:2550/), Path(/user/shardmanager-config/member-3-shard-topology-config)]] after [5000 ms]]]}FAILED! 2016-02-18 22:09:12,909 | ERROR | CommitFutures-2 | TopologyNodeWriter | 240 - org.opendaylight.netconf.topology - 1.0.0.SNAPSHOT | topology-netconf: Transaction(init topology container) DOM-CHAIN-0-0 FAILED! TransactionCommitFailedException
{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@10.30.11.66:2550/), Path(/user/shardmanager-config/member-3-shard-topology-config)]] after [5000 ms]]]}According to the discussion with the developers, the most likely cause is something like this:
- Leader election fails or something tries to write to the datastore before the leader election is done.
- Netconf topology hits the datastore failure and tries to restart.
- Netconf topology crashes because it is already registered in entity ownership service.
CONTROLLER-1468 might be relevant as it is about datastore operation failure when leader is down (in this case it appears leader is not known yet).
Attachments
Issue Links
- duplicates
-
CONTROLLER-1483 akka.pattern.AskTimeoutException on follower while BGP peer introduces 1M prefixes
- Resolved
- is blocked by
-
CONTROLLER-1483 akka.pattern.AskTimeoutException on follower while BGP peer introduces 1M prefixes
- Resolved