[CONTROLLER-1898] Improve NormalizedNodeData{Input,Output} QName coding Created: 28/May/19  Updated: 01/Jul/19  Resolved: 01/Jul/19

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: Sodium, Neon SR2

Type: Improvement Priority: Medium
Reporter: Robert Varga Assignee: Robert Varga
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Blocks
is blocked by CONTROLLER-1897 Improve QNameFactory performance Resolved
Relates
relates to CONTROLLER-1888 Update journal/snapshot/datastore ver... Resolved

 Description   

After addressing CONTROLLER-1897, overall performance improved by about 18%, but the subsequent profiling is showing readQName() still accounting for 46% of CPU time spent.

There are two components to this cost:

  • 30% is spent in readCodedString()
  • 61% is spent in QNameFactory.create()

The sample (a 350MiB snapshot) invokes readQName() 1.7M times, with the overall result being 506 unique QNames – hence QNames are obviously a good candidate for the same coding we are using for Strings.

Implementing such coding will allow us:

  • trim snapshot, as already-encoded QNames will result in one-third of reads, i.e. 1 read of 5 bytes instead of 3
  • eliminate most of the QNameFactory.create() overhead, as repetitive QNames will be looked up in a local List instead of hash-based concurrent LoadingCache

Finally, this also enables us to cache NodeIdentifier instances, as NodeIdentifier is only a wrapper around a QName. Adding a secondary lookup table for caching these wrappers should allow us to lower the memory footprint of the deserialized data.

 


Generated at Wed Feb 07 19:56:43 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.