[YANGTOOLS-922] Improve ordered list tracking Created: 02/Dec/18  Updated: 19/Sep/23

Status: Confirmed
Project: yangtools
Component/s: data-impl
Affects Version/s: None
Fix Version/s: 14.0.0

Type: Improvement Priority: High
Reporter: Robert Varga Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Blocks
blocks YANGTOOLS-921 Design insert/move operations on top ... Confirmed
Relates
relates to YANGTOOLS-920 ImmutableOrdered{LeafSet,Map}NodeBuil... Confirmed

 Description   

Our current implementation is using LinkedHashMap to track children, which is not entirely efficient, as we are forcing copies on each modification.

While a trivial optimization would be to switch to copying ImmutableMaps, this is not sufficient for the purposes of YANGTOOLS-921, as the lists involved can become quite large and we will require some sort of snapshot handling to support the operations.



 Comments   
Comment by Robert Varga [ 25/Mar/19 ]

This really boils down to defining an IndexedList interface, which aside from being a java.util.List provides:

  1. lookups by key, where the key may be a subset of value data
  2. insert operation
  3. move operation
  4. explicit mutable transitions, similar to ModifiableMapPhase and UnmodifiableMapPhase

Most significant problem here is maintenance of key-based indices in face of an insert/move which ends up shifting positions of the nodes. While we could have trivial secondary Map<Key, Integer>, maintaining that map would probably kill performance. It would seem that the secondary index needs to be somehow integrated with individual entries – perhaps a skip list would do the trick?

 

Comment by Robert Varga [ 12/Jul/21 ]

For the by-offset part we want something which behaves like a rope generalized for not char but object storage. That should essentially take care of in-order traversal (via Iterable.iterator()) as well random access (via List.get(int)) rather easy to implement.

An interesting problem here is going to be limiting fragmentation, such that we do not devolve to having a million two-item end nodes. This amounts to controlling the balance between the number of items stored and tree depth in a hypothetical 

 

abstract class RopeNode;

class LeafNode extends RopeNode {
    final Object[] values; // erasure of <V>
}

class TrunkNode extends RopeNode {
    final int leftWeight;
    final RopeNode left;
    final RopeNode right;
}

Maybe tracking depth and correlating it to something?

 

Comment by Robert Varga [ 12/Jul/21 ]

In terms of supported operations, we need to support  YANG patch operations at minimum.

Comment by Robert Varga [ 12/Jul/21 ]

I am not sure about the by-key part, as it seems like it needs to point somehow to a place in the rope.

I think the key realization here is that each list is a sum of edits on it, similar to what zipper does.

A tangential thought: Xanadu's Ent is described as

The way we get the ability to do this, in
our structure we've got other kinds of nodes in this tree which we refer to
as DspLoafs?.  Don't worry about the derivation of the term right now.
Actually Dsp- is for displacement.  So this DspLoaf? might have in it for
instance a plus three, what that means is take all the numbers that 
you encounter navigating downward from here and consider them to 
be offset by three, and these things get accumulated so over here if
you've got a plus seven, and then over here, the distinction, we refer to
as the split, is a four, then the split is really considered to be a 14,
so what that allows us to do is when we have the insertion point here and
you start typing, to basically rearrange the tree so that the distinction
between these two parts of the document get represented as a distinction
in the tree and then offset this part of the tree by this amount.  This
allows us to edit these large documents without bringing hardly any of
them into core.  Leaving most of them on disk untouched.

Now this displacement calculus, along with a zipper (which would hold the effective offset?) sounds like one way to deal with edits.

Comment by Robert Varga [ 12/Jul/21 ]

I wonder whether we could get away with an offset-based reference, where we have two classes of keys:

  • keys which reference the first item in a rope leaf (i.e. end node) map to a linear offset (and hence can be quickly found)
  • keys which reference any other item map to the key of the first item plus a linear delta

Hence any item can be located at most through two lookups by key and one lookup by offset. This additional indirection should allow us to efficiently maintain the index, as we really need to reindex only when the shape of the leaf changes significantly, such as when a lead item is moved between leafs (or is removed). When an item is prepended into a leaf, we might get by without reindexing at the cost of additional by-key lookups.

Perhaps we can use some sort of index/op counting to keep track of our worst-case lookup for a particular leaf and decide to reindex (which means N by-key replace operations where N is the number of items in the leaf).

 

Comment by Robert Varga [ 12/Jul/21 ]

The modifiable phase essentially boils down to maintaining a zipper and collapsing it to a base rope/index structure.

Comment by Vratko Polak [ 13/Jul/21 ]

Does that mean we need to support access by integer offset (as list index)?

The type name "target-resource-offset" suggest yes, but patching is only intended for configuration data, so lists always have keys (and leaf-lists are unique), and the examples use values such as "6" not as an offset, but as an actual string song name (key).

If no, we need only access by key, which simplifies matters considerable as we no longer need to renumber offsets.

Comment by Robert Varga [ 13/Jul/21 ]

Yes, we do need that – it is implied by java.util.List.

Generated at Wed Feb 07 20:54:41 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.