[CONTROLLER-808] Clustering : Exception on recovery Cache loader returning null for key Created: 11/Sep/14  Updated: 20/Sep/14  Resolved: 20/Sep/14

Status: Resolved
Project: controller
Component/s: mdsal
Affects Version/s: Helium
Fix Version/s: None

Type: Bug
Reporter: Moiz Raja Assignee: Moiz Raja
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Mac OS
Platform: PC


External issue ID: 1815
Priority: High

 Description   

These exceptions are seen on recovery which are preventing recovery from happening.

2014-09-11 02:59:09,916 | WARN | lt-dispatcher-18 | ShardManager | 152 - com.typesafe.akka.slf4j - 2.3.4 | akka://opendaylight-cluster-data/user/shardmanager-config | Supervisor Strategy of resume applied
at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2412)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
at org.opendaylight.yangtools.yang.data.impl.schema.tree.DataNodeContainerModificationStrategy.getChild(DataNodeContainerModificationStrategy.java:81)
at org.opendaylight.yangtools.yang.data.impl.schema.tree.DataNodeContainerModificationStrategy$ContainerModificationStrategy.getChild(DataNodeContainerModificationStrategy.java:119)
at org.opendaylight.yangtools.yang.data.impl.schema.tree.RootModificationApplyOperation.getChild(RootModificationApplyOperation.java:66)
at org.opendaylight.yangtools.yang.data.impl.schema.tree.TreeNodeUtils.findNodeChecked(TreeNodeUtils.java:53)
at org.opendaylight.yangtools.yang.data.impl.schema.tree.InMemoryDataTreeModification.resolveModificationStrategy(InMemoryDataTreeModification.java:137)
at org.opendaylight.yangtools.yang.data.impl.schema.tree.InMemoryDataTreeModification.resolveModificationFor(InMemoryDataTreeModification.java:143)
at org.opendaylight.yangtools.yang.data.impl.schema.tree.InMemoryDataTreeModification.merge(InMemoryDataTreeModification.java:73)
at org.opendaylight.controller.md.sal.dom.store.impl.SnapshotBackedWriteTransaction.merge(SnapshotBackedWriteTransaction.java:85)
at org.opendaylight.controller.cluster.datastore.modification.MergeModification.apply(MergeModification.java:37)
at org.opendaylight.controller.cluster.datastore.modification.MutableCompositeModification.apply(MutableCompositeModification.java:33)
at org.opendaylight.controller.cluster.datastore.Shard.commit(Shard.java:330)
at org.opendaylight.controller.cluster.datastore.Shard.applyState(Shard.java:444)
at org.opendaylight.controller.cluster.raft.RaftActor.onReceiveRecover(RaftActor.java:151)
at org.opendaylight.controller.cluster.datastore.Shard.onReceiveRecover(Shard.java:175)
at akka.persistence.UntypedPersistentActor$$anonfun$receiveRecover$1.applyOrElse(Eventsourced.scala:433)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at akka.persistence.Eventsourced$$anonfun$akka$persistence$Eventsourced$$recoveryBehavior$1.applyOrElse(Eventsourced.scala:168)
at akka.persistence.Recovery$State$$anonfun$processPersistent$1.apply(Recovery.scala:33)
at akka.persistence.Recovery$State$$anonfun$processPersistent$1.apply(Recovery.scala:33)
at akka.persistence.Recovery$class.withCurrentPersistent(Recovery.scala:176)
at akka.persistence.UntypedPersistentActor.withCurrentPersistent(Eventsourced.scala:428)
at akka.persistence.Recovery$State$class.processPersistent(Recovery.scala:33)
at akka.persistence.Recovery$$anon$1.processPersistent(Recovery.scala:95)
at akka.persistence.Recovery$$anon$1.aroundReceive(Recovery.scala:101)
at akka.persistence.Recovery$class.aroundReceive(Recovery.scala:256)
at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(Eventsourced.scala:428)
at akka.persistence.Eventsourced$$anon$1.aroundReceive(Eventsourced.scala:35)
at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:369)
at akka.persistence.UntypedPersistentActor.aroundReceive(Eventsourced.scala:428)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2014-09-11 02:59:09,916 | WARN | lt-dispatcher-18 | OneForOneStrategy | 152 - com.typesafe.akka.slf4j - 2.3.4 | akka://opendaylight-cluster-data/user/shardmanager-config/member-3-shard-default-config | CacheLoader returned null for key (urn:opendaylight:params:xml:ns:yang:controller:config:sal-clustering-it:car-people?revision=2014-08-18)car-people.



 Comments   
Comment by Tony Tkacik [ 16/Sep/14 ]

Schema for data you are trying to write is not loaded at time you are writing it.

Comment by Moiz Raja [ 16/Sep/14 ]

On recovery we try to read from the disk and apply the modification that we have recorded to the state which is in InMemoryDataStore. We start recovery only after the first SchemaContext update is received. It looks like if the schema context with the appropriate schema is not present when we are recovering this failure can occur. How do we get around this?

Comment by Tony Tkacik [ 16/Sep/14 ]

Naive approach (not knowing what is in datastore):

You need to store also list of YANG modules (namespaces) from schema context when snapshot was stored to disk and you could do full recovery (using inmemory data store) only when schema context contains all modules.

More advanced approach is to know all used namespaces (stored data) and do recovery after all modules are present, but this requires bit more logic on
writeout path and needs to store all used namespaces (even ones used in identities and instance identifiers stored in leaves).

Comment by Moiz Raja [ 16/Sep/14 ]

For persistence/recovery we are using akka-persistence. If we had to store schema related info using akka-persistence then we would need to recover first to get to the schema, which would also recover the data.

To recover conditionally based on the presence of all required schema elements would therefore require us to go outside the bounds of akka-persistence.

One possible solution I can think of is if schemaContext itself was serializable. If so that could be one proper alternative that we could use for recovery. Essentially it would work like this, when we receive a schemaContext during a normal run we would persist it to disk (using akka-persistence). On recovery we would load the schemaContext from disk and apply it to the in-memory store and then subsequently load the data. This seems like the most straightforward approach (if schemaContext was serializable that is)

Comment by Moiz Raja [ 17/Sep/14 ]

Let's say I had some data in the datastore for some module and later on I set a schema context on the datastore which did not contain a definition for that module. What would happen? Would it not be possible to retrieve that data from the datastore?

Comment by Tom Pantelis [ 19/Sep/14 ]

I think the way SchemaContext works is problematic. At my previous company, we had a similar issue with failures caused by data model schemas not being present yet on restart so I preloaded all data model schemas from all bundles on restart by having a component that started first and scraped the schema-related files from all RESOLVED bundles. I think, longer term, we should do a similar thing with the SchemaContext so we don't have to keep putting in hacks like this.

Comment by Moiz Raja [ 19/Sep/14 ]

Tom, I agree in principle with your point - but for Helium we do need to hack this.

Comment by Tom Pantelis [ 19/Sep/14 ]

Yeah I know - this should be investigated post-Helium.

Comment by Tony Tkacik [ 19/Sep/14 ]

GlobalBundleScanningSchemaServiceImpl is one which exactly does that - gets schema sources from all resolved bundles.

If I understand you correctly, what you are proposing is having component,
which behaves as GlobalBundleScanningSchemaServiceImpl, but also caches schemas, so on subsequent restart (if nothing changed) it has all schemas
ready at start?

Actually this "hack" what I am proposing is also correct one, image some model bundles were unistalled, so still comparing schema context capabilities is still required (basicly because of restart and reinstall capabilities of your system changed).

So yes, we need better heuristics how to recover datastore in various scenarios, which should be part of Lithium and also maybe Helium Service Release.

Comment by Tom Pantelis [ 19/Sep/14 ]

Moiz's hack should work fine for now except if a module bundle is removed from the installation. On restart, the persisted modules set would contain a stale module and the modules in the new SchemaContext wouldn't "containAll" persisted modules so we wouldn't start the shards. But that would be uncommon so maybe we can live with it for now.

It looks like it's the ModuleInfoBundleTracker that backs the SchemaContext. Currently it tracks ACTIVE bundles. I think if we changed it to track RESOLVED bundles, like GlobalBundleScanningSchemaServiceImpl does, it might fix this problem w/o having to do the hack. I wouldn't think there would be any adverse ramifications to this.

What do you guys think?

(In reply to Tony Tkacik from comment #9)
> GlobalBundleScanningSchemaServiceImpl is one which exactly does that - gets
> schema sources from all resolved bundles.
>
> If I understand you correctly, what you are proposing is having component,
> which behaves as GlobalBundleScanningSchemaServiceImpl, but also caches
> schemas, so on subsequent restart (if nothing changed) it has all schemas
> ready at start?
>
> Actually this "hack" what I am proposing is also correct one, image some
> model bundles were unistalled, so still comparing schema context
> capabilities is still required (basicly because of restart and reinstall
> capabilities of your system changed).
>
> So yes, we need better heuristics how to recover datastore in various
> scenarios, which should be part of Lithium and also maybe Helium Service
> Release.

Comment by Moiz Raja [ 19/Sep/14 ]

Tom, I'm not sure what would be the impact of that change - I wouldn't want to introduce it in Helium.

From observation it does NOT seem that on startup schemaContext gets injected more than once most of the times. At exit however I see it being injected several times - I haven't investigated why yet.

Comment by Tom Pantelis [ 19/Sep/14 ]

The config system starts modules async so many if not all bundles will have transitioned to ACTIVE at that point. But, of course, it depends on timing and is unpredictable. On shutdown, as module bundles are stopped, I believed they're removed from the SchemaContext - that would explain why you see updates. If so, that shouldn't screw up your changes because newModules wouldn't "containAll" knownModules so we wouldn't update the persistence.

I understand your reservations about changing the module schema loading at this point. But it seems like it would be a benign change - I'd be interested in Tony's opinion. We can always change it post-Helium when there's more time to let it bake. Removing your "hack" should be simple.

(In reply to Moiz Raja from comment #11)
> Tom, I'm not sure what would be the impact of that change - I wouldn't want
> to introduce it in Helium.
>
> From observation it does NOT seem that on startup schemaContext gets
> injected more than once most of the times. At exit however I see it being
> injected several times - I haven't investigated why yet.

Comment by Moiz Raja [ 20/Sep/14 ]

https://git.opendaylight.org/gerrit/#/c/11346/

Generated at Wed Feb 07 19:53:55 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.