[YANGTOOLS-1074] Define NormalizedBody and specializations Created: 22/Jan/20  Updated: 19/Sep/23

Status: Confirmed
Project: yangtools
Component/s: data-impl
Affects Version/s: None
Fix Version/s: 14.0.0

Type: Improvement Priority: High
Reporter: Robert Varga Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Epic Link: Redesign NormalizedNode

 Description   

Analysis of a netvirt heap dump is showing that the data tree involved is around 640MiB, with 147MiB (~23%) retained by NormalizedNode implementations (6.5M objects). Each of these is costing typically 24 bytes, with 4 byte alignment shadow - typically containing the identifier and map of children.

If we eliminate the idea that a NormalizedNode has an identifier, we would save ~49MiB (33% shallow, 13% overall) heap by making these cost typically 16 bytes.

There are few places where identifier is required, which would have to be changed to carry Map.Entry<PathArgument, NormalizedNode> instead - a change cascading through quite a few interfaces. The most problematic will probably be NormalizedNodeStreamWriter.

Aside from memory savings, this would solve the interesting problem of DataTree root node (i.e. corresponding to SchemaContext) needing an identifier, which is currently wedged to use SchemaContext.NAME - which can lead to problems with revisions in some cases.

Consider dropping Identifiable from NormalizedNode, adjusting all users to cope with that.



 Comments   
Comment by Robert Varga [ 21/Feb/20 ]

The idea is to have:

public interface ContainerNode extends Map.Entry<NodeIdentifier, ContainerNodeState>
        [ extends SchemaNode state mess],
        Identifiable<NodeIdentifier>, ContainerNodeState {

    default NodeIdentifier getIdentifier() {
       return getKey();
    }
}

public class DefaultContainerNode extends AbstractMap.ImmutableSimpleEntry<NodeIdentifier, ContainerNodeState> implements ContainerNode {

}

which would be a compat view. Except naming sucks – is much better to not lug around "State" suffix in most places. It also has performance implications on streaming API, where we really want to propagate these from a Map.entrySet(), otherwise we could end up hurting the GC.

I do not believe we can deliver this in Aluminium.

 

Comment by Robert Varga [ 13/Nov/20 ]

The entry part of this proposal needs to be provided separately, as it would end up propagating type. The idea is that NormalizedNode is semantically equivalent to Entry<PathArgument, ? extends Object>, but we do not express that as a generic parameter by rather through an

NormalizedNode {
  default Entry<? extends PathArgument, ? extends Object> toEntry() {
    return Map.entry(getIdentifier(), body());
  }
}

We then override this method to provide more precise approcimation, so that:

ContainerNode extends NormalizedNode {
  @Override
  NodeIdentifier getIdentifier();

  @Override
  ContainerBody body()

  @Override
  default Entry<NodeIdentifier, ContainerBody> toEntry() {
    return Map.entry(getIdentifier(), body());
  }
}
Comment by Robert Varga [ 13/Nov/20 ]

We need the equivalent of ContainerBody for every NormalizedNode so that nodes can reconstructed easily. ForeignDataNodes are an exception: their body is a foreign representation, which we know nothing about (it can be a DOMSource).

Comment by Robert Varga [ 04/Nov/22 ]

There is an interesting conflict between LeafNode and AnydataNode: how do we know an AnydataNode is not represented as a byte[], which is the same as a leaf for 'type binary'.
It seems this issue is a bit more thorny: we certainly want to solve ContainerNode (due to identifier) and LeafNode/LeafSetEntryNode (for memory efficiency), but others perhaps can be left alone for later.

If we scope out those two, they boil down to:

  • ValueNode, which covers LeafNode and LeafSetEntryNode. When dealing with reconstitution, LeafSetEntryNode can be known to be allowed only in LeafSetNodes and LeafNode is a DataContainerChild which implies its parent is a DataContainerNode
  • ContainerNode is a DataContainerNode (and DataContainerChild which adds guidance from ValueNode)

Looking at DataContainerNode, it is a DistinctNodeContainer (implying child addressability) and has three specializations:

  • ContainerNode
  • LeafSetNode (note: addressability is wild – either linear position for ordered, or identifier for unordered, this is still subject to evolution)
  • MapNode (addressability, but shares ordering w.r.t. UserMapNode/UserLeafSetNode)

DataContainerNode itself covers AugmentatioNode (which is going away) and ChoiceNode, ContainerNode, MapEntryNode, MountPointNode, UnkeyedListEntryNode – not all of which reap the memory benefits.

Generated at Wed Feb 07 20:55:07 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.