[YANGTOOLS-922] Improve ordered list tracking Created: 02/Dec/18 Updated: 19/Sep/23 |
|
| Status: | Confirmed |
| Project: | yangtools |
| Component/s: | data-impl |
| Affects Version/s: | None |
| Fix Version/s: | 14.0.0 |
| Type: | Improvement | Priority: | High |
| Reporter: | Robert Varga | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Description |
|
Our current implementation is using LinkedHashMap to track children, which is not entirely efficient, as we are forcing copies on each modification. While a trivial optimization would be to switch to copying ImmutableMaps, this is not sufficient for the purposes of YANGTOOLS-921, as the lists involved can become quite large and we will require some sort of snapshot handling to support the operations. |
| Comments |
| Comment by Robert Varga [ 25/Mar/19 ] |
|
This really boils down to defining an IndexedList interface, which aside from being a java.util.List provides:
Most significant problem here is maintenance of key-based indices in face of an insert/move which ends up shifting positions of the nodes. While we could have trivial secondary Map<Key, Integer>, maintaining that map would probably kill performance. It would seem that the secondary index needs to be somehow integrated with individual entries – perhaps a skip list would do the trick?
|
| Comment by Robert Varga [ 12/Jul/21 ] |
|
For the by-offset part we want something which behaves like a rope generalized for not char but object storage. That should essentially take care of in-order traversal (via Iterable.iterator()) as well random access (via List.get(int)) rather easy to implement. An interesting problem here is going to be limiting fragmentation, such that we do not devolve to having a million two-item end nodes. This amounts to controlling the balance between the number of items stored and tree depth in a hypothetical
abstract class RopeNode; class LeafNode extends RopeNode { final Object[] values; // erasure of <V> } class TrunkNode extends RopeNode { final int leftWeight; final RopeNode left; final RopeNode right; } Maybe tracking depth and correlating it to something?
|
| Comment by Robert Varga [ 12/Jul/21 ] |
|
In terms of supported operations, we need to support YANG patch operations at minimum. |
| Comment by Robert Varga [ 12/Jul/21 ] |
|
I am not sure about the by-key part, as it seems like it needs to point somehow to a place in the rope. I think the key realization here is that each list is a sum of edits on it, similar to what zipper does. A tangential thought: Xanadu's Ent is described as The way we get the ability to do this, in our structure we've got other kinds of nodes in this tree which we refer to as DspLoafs?. Don't worry about the derivation of the term right now. Actually Dsp- is for displacement. So this DspLoaf? might have in it for instance a plus three, what that means is take all the numbers that you encounter navigating downward from here and consider them to be offset by three, and these things get accumulated so over here if you've got a plus seven, and then over here, the distinction, we refer to as the split, is a four, then the split is really considered to be a 14, so what that allows us to do is when we have the insertion point here and you start typing, to basically rearrange the tree so that the distinction between these two parts of the document get represented as a distinction in the tree and then offset this part of the tree by this amount. This allows us to edit these large documents without bringing hardly any of them into core. Leaving most of them on disk untouched. Now this displacement calculus, along with a zipper (which would hold the effective offset?) sounds like one way to deal with edits. |
| Comment by Robert Varga [ 12/Jul/21 ] |
|
I wonder whether we could get away with an offset-based reference, where we have two classes of keys:
Hence any item can be located at most through two lookups by key and one lookup by offset. This additional indirection should allow us to efficiently maintain the index, as we really need to reindex only when the shape of the leaf changes significantly, such as when a lead item is moved between leafs (or is removed). When an item is prepended into a leaf, we might get by without reindexing at the cost of additional by-key lookups. Perhaps we can use some sort of index/op counting to keep track of our worst-case lookup for a particular leaf and decide to reindex (which means N by-key replace operations where N is the number of items in the leaf).
|
| Comment by Robert Varga [ 12/Jul/21 ] |
|
The modifiable phase essentially boils down to maintaining a zipper and collapsing it to a base rope/index structure. |
| Comment by Vratko Polak [ 13/Jul/21 ] |
|
Does that mean we need to support access by integer offset (as list index)? The type name "target-resource-offset" suggest yes, but patching is only intended for configuration data, so lists always have keys (and leaf-lists are unique), and the examples use values such as "6" not as an offset, but as an actual string song name (key). If no, we need only access by key, which simplifies matters considerable as we no longer need to renumber offsets. |
| Comment by Robert Varga [ 13/Jul/21 ] |
|
Yes, we do need that – it is implied by java.util.List. |