[CONTROLLER-1576] OOM when persisting a snapshot of the datastore that is bigger than 2G Created: 11/Jan/17  Updated: 25/Jul/23  Resolved: 01/Mar/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Guy Sela Assignee: Tom Pantelis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Relates
relates to CONTROLLER-1864 A 8s "non-GC world stop JVM pause" du... Resolved
External issue ID: 7521

 Description   

Using Stable/Boron.
When datastore grew to a certain size, I received an OOM from the snapshot operation. After analysis the conclusion was that the snapshot is being serialized to a single byte[] and if the size is bigger than 2G, the code of ByteArrayOutputStream will throw an OOM error:
private static int hugeCapacity(int minCapacity)

{ if (minCapacity < 0) // overflow throw new OutOfMemoryError(); return (minCapacity > MAX_ARRAY_SIZE) ? Integer.MAX_VALUE : MAX_ARRAY_SIZE; }

Log:

Uncaught error from thread [opendaylight-cluster-data-akka.actor.default-dispatcher-54] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[opendaylight-cluster-data]
java.lang.OutOfMemoryError
at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877)
at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1808)
at java.io.DataOutputStream.writeInt(DataOutputStream.java:200)
at java.io.ObjectOutputStream$BlockDataOutputStream.writeInt(ObjectOutputStream.java:1943)
at java.io.ObjectOutputStream.writeInt(ObjectOutputStream.java:797)
at org.opendaylight.controller.cluster.datastore.node.utils.stream.AbstractNormalizedNodeDataOutput.writeInt(AbstractNormalizedNodeDataOutput.java:107)
at org.opendaylight.controller.cluster.datastore.node.utils.stream.NormalizedNodeOutputStreamWriter.writeString(NormalizedNodeOutputStreamWriter.java:60)
at org.opendaylight.controller.cluster.datastore.node.utils.stream.NormalizedNodeOutputStreamWriter.writeQName(NormalizedNodeOutputStreamWriter.java:47)
at org.opendaylight.controller.cluster.datastore.node.utils.stream.AbstractNormalizedNodeDataOutput.writeKeyValueMap(AbstractNormalizedNodeDataOutput.java:400)
at org.opendaylight.controller.cluster.datastore.node.utils.stream.AbstractNormalizedNodeDataOutput.startMapEntryNode(AbstractNormalizedNodeDataOutput.java:250)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter$OrderedNormalizedNodeWriter.writeMapEntryNode(NormalizedNodeWriter.java:267)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.wasProcessedAsCompositeNode(NormalizedNodeWriter.java:208)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.write(NormalizedNodeWriter.java:100)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.writeChildren(NormalizedNodeWriter.java:171)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.wasProcessedAsCompositeNode(NormalizedNodeWriter.java:238)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.write(NormalizedNodeWriter.java:100)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.writeChildren(NormalizedNodeWriter.java:171)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.wasProcessedAsCompositeNode(NormalizedNodeWriter.java:196)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.write(NormalizedNodeWriter.java:100)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.writeChildren(NormalizedNodeWriter.java:171)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter$OrderedNormalizedNodeWriter.writeMapEntryNode(NormalizedNodeWriter.java:282)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.wasProcessedAsCompositeNode(NormalizedNodeWriter.java:208)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.write(NormalizedNodeWriter.java:100)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.writeChildren(NormalizedNodeWriter.java:171)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.wasProcessedAsCompositeNode(NormalizedNodeWriter.java:238)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.write(NormalizedNodeWriter.java:100)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.writeChildren(NormalizedNodeWriter.java:171)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter$OrderedNormalizedNodeWriter.writeMapEntryNode(NormalizedNodeWriter.java:282)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.wasProcessedAsCompositeNode(NormalizedNodeWriter.java:208)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.write(NormalizedNodeWriter.java:100)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.writeChildren(NormalizedNodeWriter.java:171)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.wasProcessedAsCompositeNode(NormalizedNodeWriter.java:238)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.write(NormalizedNodeWriter.java:100)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.writeChildren(NormalizedNodeWriter.java:171)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.wasProcessedAsCompositeNode(NormalizedNodeWriter.java:223)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.write(NormalizedNodeWriter.java:100)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.writeChildren(NormalizedNodeWriter.java:171)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter$OrderedNormalizedNodeWriter.writeMapEntryNode(NormalizedNodeWriter.java:282)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.wasProcessedAsCompositeNode(NormalizedNodeWriter.java:208)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.write(NormalizedNodeWriter.java:100)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.writeChildren(NormalizedNodeWriter.java:171)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.wasProcessedAsCompositeNode(NormalizedNodeWriter.java:238)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.write(NormalizedNodeWriter.java:100)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.writeChildren(NormalizedNodeWriter.java:171)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.wasProcessedAsCompositeNode(NormalizedNodeWriter.java:196)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.write(NormalizedNodeWriter.java:100)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.writeChildren(NormalizedNodeWriter.java:171)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.wasProcessedAsCompositeNode(NormalizedNodeWriter.java:196)
at org.opendaylight.yangtools.yang.data.api.schema.stream.NormalizedNodeWriter.write(NormalizedNodeWriter.java:100)
at org.opendaylight.controller.cluster.datastore.node.utils.stream.AbstractNormalizedNodeDataOutput.writeNormalizedNode(AbstractNormalizedNodeDataOutput.java:157)
at org.opendaylight.controller.cluster.datastore.utils.SerializationUtils.serializeNormalizedNode(SerializationUtils.java:90)
at org.opendaylight.controller.cluster.datastore.persisted.MetadataShardDataTreeSnapshot$Proxy.writeExternal(MetadataShardDataTreeSnapshot.java:56)
at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.opendaylight.controller.cluster.datastore.persisted.AbstractVersionedShardDataTreeSnapshot.versionedSerialize(AbstractVersionedShardDataTreeSnapshot.java:77)
at org.opendaylight.controller.cluster.datastore.persisted.AbstractVersionedShardDataTreeSnapshot.serialize(AbstractVersionedShardDataTreeSnapshot.java:95)
at org.opendaylight.controller.cluster.datastore.actors.ShardSnapshotActor.handleReceive(ShardSnapshotActor.java:54)
at org.opendaylight.controller.cluster.common.actor.AbstractUntypedActor.onReceive(AbstractUntypedActor.java:26)
at org.opendaylight.controller.cluster.common.actor.MeteringBehavior.apply(MeteringBehavior.java:97)
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)
at akka.actor.Actor$class.aroundReceive(Actor.scala:484)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Suppressed: java.lang.OutOfMemoryError



 Comments   
Comment by Colin Dixon [ 19/Jan/17 ]

Out of curiosity, how critical do we feel fixing this is? Is it critical to get it fixed in Carbon? I'm just trying to understand how immediate the needs are here.

Comment by Guy Sela [ 22/Jan/17 ]

For our deployments, it is the upper bound for the scale currently, so it is very important.
Working with full mesh of tunnels in netvirt is blocked in 100 computes.
Without full mesh, I can't give you the exact number, but from analyzing the histogram of the heap, we will definitely be blocked on a number that is lower than what we need.

Comment by Tom Pantelis [ 25/Jan/17 ]

Patch series: https://git.opendaylight.org/gerrit/#/q/status:open+project:controller+branch:master+topic:file_backed_snapshots

The remaining issue is that akka's snapshot store serializes the snapshot payload to a byte[] before persisting to disk se we'll run into the same limit issue. To fix this we'll have to write our own snapshot store implementation that serializes the payload directly to the file.

Comment by Tom Pantelis [ 31/Jan/17 ]

Full list of patches: https://git.opendaylight.org/gerrit/#/q/status:open+project:controller+branch:master+topic:bug/7521

Comment by Michael Vorburger [ 22/Sep/18 ]

Just for the record: This (or rather the lack of this in an old version) came up again in CONTROLLER-1864.

Generated at Wed Feb 07 19:55:53 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.