Experiments in the YANGTOOLS-1128 area show that using ANTLR parse tree has downside in the amount of memory we consume. This comes from three facts:
- ANTLR is completely transparent, hence each token has a lot of metadata about where it comes from. We do not use most of that metadata.
- A lot of the tokens are simple separators. We do not use those tokes at all.
- Tokens are not really immutable to support some use cases which we do not use.
We also do not allocate strings entirely efficiently: there are plenty of models in the wild which are auto-generated and do not take advantage of YANG facilities, hence there are a number of duplicate construct definitions – and those strings end up being duplicated simply because we are geared towards sane models.
Introduce a heavily-interned intermediate representation instead of relying on ANTLR-generated tree. While interning expends a non-trivial amount of CPU cycles to get strings de-duplicated, the results for benchmark models end up saving a ton of duplication. This also has some bearing onto the size of the effective model, which seems to benefit from this upfront work. Experimentation shows >90% memory footprint reduction.
- split from
-
YANGTOOLS-1128 Add a dedicated ANTLR token factory
- Resolved