[CONTROLLER-858] Clustering: Update Cluster datastore to provide notification/API to check its availability for clients Created: 18/Sep/14  Updated: 19/Oct/17  Resolved: 05/Jul/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: Helium
Fix Version/s: None

Type: Improvement
Reporter: Wojciech Dec Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Blocks
is blocked by CONTROLLER-1483 akka.pattern.AskTimeoutException on f... Resolved

 Description   

When attempting to install via karaf odl-mdsal-all, one gets the following errors.

Steps to reproduce:

  • Download latest integration branch
  • maven clean install
  • karaf + install:feature odl-mdsal-all

opendaylight-user@root>Exception in thread "config-pusher" java.lang.IllegalStateException: Failed to send commit for configuration 04-xsql.xml(odl-mdsal-all,odl-mdsal-xsql)
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.sendRequestGetResponseCheckIsOK(ConfigPusherImpl.java:292)
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.pushConfig(ConfigPusherImpl.java:246)
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.pushConfigWithConflictingVersionRetries(ConfigPusherImpl.java:135)
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.internalPushConfigs(ConfigPusherImpl.java:111)
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.process(ConfigPusherImpl.java:78)
at org.opendaylight.controller.netconf.persist.impl.osgi.ConfigPersisterActivator$InnerCustomizer$1.run(ConfigPersisterActivator.java:178)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.IllegalStateException: Error - getInstance() failed for ModuleIdentifier

{factoryName='XSQL', instanceName='XSQL'}

in transaction TransactionIdentifier

{name='ConfigTransaction-50-52'}

at org.opendaylight.controller.config.manager.impl.ConfigTransactionControllerImpl.secondPhaseCommit(ConfigTransactionControllerImpl.java:402)
at org.opendaylight.controller.config.manager.impl.ConfigRegistryImpl.secondPhaseCommit(ConfigRegistryImpl.java:282)
at org.opendaylight.controller.config.manager.impl.ConfigRegistryImpl.commitConfig(ConfigRegistryImpl.java:229)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:193)
at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:175)
at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:117)
at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:54)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at com.sun.jmx.mbeanserver.MXBeanProxy$InvokeHandler.invoke(MXBeanProxy.java:150)
at com.sun.jmx.mbeanserver.MXBeanProxy.invoke(MXBeanProxy.java:167)
at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:252)
at com.sun.proxy.$Proxy31.commitConfig(Unknown Source)
at org.opendaylight.controller.config.util.ConfigRegistryJMXClient.commitConfig(ConfigRegistryJMXClient.java:102)
at org.opendaylight.controller.netconf.confignetconfconnector.transactions.TransactionProvider.commitTransaction(TransactionProvider.java:110)
at org.opendaylight.controller.netconf.confignetconfconnector.operations.Commit.handleWithNoSubsequentOperations(Commit.java:55)
at org.opendaylight.controller.netconf.util.mapping.AbstractLastNetconfOperation.handle(AbstractLastNetconfOperation.java:33)
at org.opendaylight.controller.netconf.util.mapping.AbstractNetconfOperation.handle(AbstractNetconfOperation.java:105)
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.sendRequestGetResponseCheckIsOK(ConfigPusherImpl.java:287)
... 6 more
Caused by: java.lang.IllegalStateException: Failed to read pre-existing data
at org.opendaylight.controller.md.sal.binding.impl.AbstractReadWriteTransaction.ensureParentsByMerge(AbstractReadWriteTransaction.java:55)
at org.opendaylight.controller.md.sal.binding.impl.AbstractWriteTransaction.put(AbstractWriteTransaction.java:44)
at org.opendaylight.controller.md.sal.binding.impl.ForwardedBackwardsCompatibleDataBroker$ForwardedBackwardsCompatibleTransacion.putOperationalData(ForwardedBackwardsCompatibleDataBroker.java:218)
at org.opendaylight.controller.md.sal.binding.impl.ForwardedBackwardsCompatibleDataBroker$ForwardedBackwardsCompatibleTransacion.putOperationalData(ForwardedBackwardsCompatibleDataBroker.java:186)
at org.opendaylight.xsql.XSQLProvider.buildXSQL(XSQLProvider.java:31)
at org.opendaylight.yang.gen.v1.http.netconfcentral.org.ns.xsql.rev140626.XSQLModule.createInstance(XSQLModule.java:26)
at org.opendaylight.yang.gen.v1.http.netconfcentral.org.ns.xsql.rev140626.AbstractXSQLModule.getInstance(AbstractXSQLModule.java:95)
at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.opendaylight.controller.config.manager.impl.dependencyresolver.DependencyResolverManager$1.handleInvocation(DependencyResolverManager.java:152)
at com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:70)
at com.sun.proxy.$Proxy70.getInstance(Unknown Source)
at org.opendaylight.controller.config.manager.impl.ConfigTransactionControllerImpl.secondPhaseCommit(ConfigTransactionControllerImpl.java:396)
... 36 more
Caused by: ReadFailedException

{message=Error checking exists for path /(http://netconfcentral.org/ns/XSQL?revision=2014-06-26)XSQL, errorList=[RpcError [message=Error checking exists for path /(http://netconfcentral.org/ns/XSQL?revision=2014-06-26)XSQL, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.cluster.datastore.exceptions.TimeoutException: Sending message class org.opendaylight.controller.protobuff.messages.transaction.ShardTransactionMessages$CreateTransaction to actor ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-operational/member-1-shard-default-operational)] failed]]}

at org.opendaylight.controller.cluster.datastore.TransactionProxy$NoOpTransactionContext.dataExists(TransactionProxy.java:846)
at org.opendaylight.controller.cluster.datastore.TransactionProxy.exists(TransactionProxy.java:258)
at org.opendaylight.controller.md.sal.dom.broker.impl.DOMForwardedReadWriteTransaction.exists(DOMForwardedReadWriteTransaction.java:54)
at org.opendaylight.controller.md.sal.binding.impl.AbstractReadWriteTransaction.ensureParentsByMerge(AbstractReadWriteTransaction.java:52)
... 49 more
Caused by: org.opendaylight.controller.cluster.datastore.exceptions.TimeoutException: Sending message class org.opendaylight.controller.protobuff.messages.transaction.ShardTransactionMessages$CreateTransaction to actor ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-operational/member-1-shard-default-operational)] failed
at org.opendaylight.controller.cluster.datastore.utils.ActorContext.executeRemoteOperation(ActorContext.java:189)
at org.opendaylight.controller.cluster.datastore.utils.ActorContext.executeShardOperation(ActorContext.java:243)
at org.opendaylight.controller.cluster.datastore.TransactionProxy.createTransactionIfMissing(TransactionProxy.java:382)
at org.opendaylight.controller.cluster.datastore.TransactionProxy.exists(TransactionProxy.java:256)
... 51 more
Caused by: java.lang.IllegalStateException: Could not find leader so transaction cannot be created
at org.opendaylight.controller.cluster.datastore.Shard.onReceiveCommand(Shard.java:221)
at akka.persistence.UntypedPersistentActor.onReceive(Eventsourced.scala:430)
at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:80)
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:534)
at akka.persistence.Recovery$State$class.process(Recovery.scala:30)
at akka.persistence.ProcessorImpl$$anon$2.process(Processor.scala:103)
at akka.persistence.ProcessorImpl$$anon$2.aroundReceive(Processor.scala:114)
at akka.persistence.Recovery$class.aroundReceive(Recovery.scala:256)
at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(Eventsourced.scala:428)
at akka.persistence.Eventsourced$$anon$2.doAroundReceive(Eventsourced.scala:82)
at akka.persistence.Eventsourced$$anon$2.aroundReceive(Eventsourced.scala:78)
at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:369)
at akka.persistence.UntypedPersistentActor.aroundReceive(Eventsourced.scala:428)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)



 Comments   
Comment by Harman Singh [ 18/Sep/14 ]

This issues happens because Xsql bundle tries to use datastore, While clustering shards are getting ready and they are not available for use.

This is not clustering issue, should be fixed at each application level.

Comment by Harman Singh [ 18/Sep/14 ]

Ideally, each application, if performing some datastore related operations, should check, if datastore is available or perform that operation in callback manner, once its available.

Comment by Sharon Aicler [ 18/Sep/14 ]

The fix implementation should be done "underneath" the Data Broker interface so ALL the applications should not be concerned with this race condition issue. If the fix will be done in XSQL, the race condition will just move to the next loading application and thus, the bug won't be fixed, just shift to another application.

Comment by Harman Singh [ 22/Sep/14 ]

The issue that we see can happen with any client application that is trying to use cluster data store, we should provide some notification of status of datastore that applications can listen to.

Comment by Tony Tkacik [ 16/Oct/14 ]

Clustered data store should not provide its instance to clients when is not ready, since client gets "not fully working" implementation. Better solution is to wait in create instance till clustering is initialized and then return.

Comment by Moiz Raja [ 06/Jan/15 ]

Consensus

1. Datastore should notify when it is not available
2. An external listener should shutdown the controller
3. Some external program should restart the controller (eg monit)

Comment by Moiz Raja [ 18/Aug/15 ]

There are a few things that were done to help with this,

  • The datastore remains unavailable till the leaders for all shards are available. We wait for 90 seconds and then give up
  • We retry transactions
  • We throw an DataStoreUnavailableException when elections are going on

There is no clear use case for a notification to be issued when datastore is unavailable.

We may want to create a service which given a yanginstanceid can inform us of the availability of the underlying shard.

Generated at Wed Feb 07 19:54:03 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.