-
Improvement
-
Resolution: Done
-
Medium
-
None
-
None
After addressing CONTROLLER-1897, overall performance improved by about 18%, but the subsequent profiling is showing readQName() still accounting for 46% of CPU time spent.
There are two components to this cost:
- 30% is spent in readCodedString()
- 61% is spent in QNameFactory.create()
The sample (a 350MiB snapshot) invokes readQName() 1.7M times, with the overall result being 506 unique QNames – hence QNames are obviously a good candidate for the same coding we are using for Strings.
Implementing such coding will allow us:
- trim snapshot, as already-encoded QNames will result in one-third of reads, i.e. 1 read of 5 bytes instead of 3
- eliminate most of the QNameFactory.create() overhead, as repetitive QNames will be looked up in a local List instead of hash-based concurrent LoadingCache
Finally, this also enables us to cache NodeIdentifier instances, as NodeIdentifier is only a wrapper around a QName. Adding a secondary lookup table for caching these wrappers should allow us to lower the memory footprint of the deserialized data.
- is blocked by
-
CONTROLLER-1897 Improve QNameFactory performance
- Resolved
- relates to
-
CONTROLLER-1888 Update journal/snapshot/datastore version
- Resolved