[CONTROLLER-2044] Improve sal-akka-raft serialization protocol Created: 26/May/22 Updated: 16/Jan/24 |
|
| Status: | Confirmed |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | None |
| Fix Version/s: | 10.0.0, 9.0.1 |
| Type: | Improvement | Priority: | High |
| Reporter: | Robert Varga | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | pt | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
The cutover to Akka Artery and tell-based protocol (which puts more strain on persistence) crops up problems with serialization of our messages, like Given that Akka has deprecated Java serialization as the protocol to use and the mentioned issues, it would be prudent to implement a better serialization protocol around Payloads, both in order to remove dependency on Java serialization and to improve its efficiency. This mostly revolves around org.opendaylight.controller.cluster.raft.messages.Payload extensibility – we really want to attach a PayloadRegistry which will handle dispatch to serialization based on a single-byte type and base it on DataOutput/WritableObject interfaces. This will allow us to reduce the message overheads (which are significant). We should also deal with sal-akka-raft message serialization based on similar interface and tie it in with sal-akka-segmented-journal, so that it can operate more efficiently than through Serializable. |
| Comments |
| Comment by Tibor Král [ 11/Nov/22 ] |
|
Even when we design and implement the new Serialization utils and the PayloadRegistry, we can't simply replace the old java serialization. Since this is a breaking change, we will need to support both the old solution as well as the new one. The old one will need to be deprecated first and removed in the subsequent major release. |
| Comment by Robert Varga [ 02/Dec/22 ] |
|
Right, except we are more flexible than that. Our upgrade matrix says 'upgrade from latest SR', which means we can very much evolve the format as long as we backport the bits needed for reading to active streams. See |
| Comment by Robert Varga [ 02/Mar/23 ] |
|
CONTROLLER-2072 further narrows the serialization interface down so as not to expose Kryo (and eliminate it from the picture while still retaining its serialization format). This issue needs to pick up from there. Next steps are to narrow down the interface between sal-akka-segmented-journal and sal-akka-raft. CONTROLLER-2043 has interplay here, as it provides data as to what our actual binary looks like. A brief analysis shows at least uuid being repeated, but certainly are other, deeper, connections to make. The end zone is CONTROLLER-2073, which ties the entire stack together.
|
| Comment by Robert Varga [ 02/Mar/23 ] |
|
The end-to-end serialization protocol needs to achieve something that Aeron has integrated: a single logical message (say, CommitTransactionPayload) is not written as a single byte[] prepared beforehand, but rather is scattered into multiple fragments which have some (possibly small) maximum size – in our world each such fragment would be a single SegmentedJournal entries. These then get presented to upper layers on reception, i.e. we are looking at doing reassembly.] In concrete terms this would mean that a ReplicatedLog entry is backed by multiple SegmentedJournal entries. For our implementation purposes, I think we should be okay by not exposing fragmentation explictly, but hiding it behind a DataOutput – which on writeInt() does the right thing behind the scenes. On reception, the callback would be a DataInput, which actually goes to multiple SegmentedJournal spans (suitably abstracted as ByteBuffer, or similar) and reads them as things are being read it. I am not sure this will cut it, though: the read side may be fine being synchronous, the write side, though, can hit storage allocation (i.e. when SegmentedJournal creates the next file) and at least in CSIT that operation can take almost a minute – and during that time we need to be able to handle RAFT messages, so as not to lose leadership (for example). |
| Comment by Samuel Schneider [ 19/Apr/23 ] |
|
|
| Comment by Robert Varga [ 19/Apr/23 ] |
|
Right, and that's all centralized in one place and does not bring sal-akka-segmented-journal into the picture at all. So let's start at the bottom and design how will things get persisted in sal-akka-segmented-journal and atomix-storage, what interface will be available for sal-akka-raft to plug into. That includes how things will hit storage, but also how things will get restored – potentially in a different version. The goal is not to rely on Serializable anywhere in the stack. |
| Comment by Robert Varga [ 31/Dec/23 ] |
|
Alright, after spending a couple of days in the area, this is a tad more complicated. SegmentedJournal's API captures the RAFT journal, i.e. Journal<E> would really like to be Journal<ReplicatedLogEntry>, which is quite a bit different layer from where sal-akka-segmented-journal operates. https://git.opendaylight.org/gerrit/c/controller/+/109487 shows a prototype outline of what we would like to see w.r.t. serialization API: each entry ends up writing multiple fragments and on read it is composed of a number of read only ByteBuffer (backed by the mapped file). The end play is CONTROLLER-2073, which ends up integrating persistence into ReplicatedLog implementation, so that the implementation deals with snapshots, trimming, persistence etc. etc. Digging into this deeper, we have two technologies available off the shelf: The BytesMarshallable interface certainly looks like something we'd like as the base model of how a payload gets serialized/deserialized. The second part of this is the fact that Payload users seem to want the associated Identifier – all relevant implementations have some sort of identifier and those which do not can fake it. Lifecycle is quite murky, but it would seem advantageous to move the identifier into an envelope, so it is available without deserializing the entire payload (as it is at the head of it!). That brings us to the Serdes API. As the mentioned patch shows, we want to side-step the need for writeObject() in sal-akka-segmented-journal by having a each Payload type:
Based on this API, segmented-journal should be able to side-step writeObject() and thus eliminate the worst Java serialization overhead. In parallel to this, we need to fix the entry UUID duplications, which looks like something that is internal to the journal plugin and does not need any other help. |