[CONTROLLER-2073] Ditch Akka persistence from sal-akka-raft Created: 02/Mar/23 Updated: 15/Jan/24 |
|
| Status: | Confirmed |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | None |
| Fix Version/s: | 10.0.0 |
| Type: | Improvement | Priority: | Medium |
| Reporter: | Robert Varga | Assignee: | Tibor Král |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | pick-next, pt | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Description |
|
Our current interplay with persistence and replication is rather flawed, as we allow for way too much generic things to be pluggable through interfaces which are not designed to handle them. There are four aspects here:
We use an on-heap cache of entries serialized to byte[] to serve (1) and we use Akka Peristence to provide (2) via a custom hack, which overrides the normal durability knobs – all the while plugging in (3) and (4) into it. So we are dealing with 4 layers of indirection:
At the end of the day the serialized view of each journal entry is available, through familiar 64bit ID growing index, from atomix-storage, the contents of which is in a (set of) memory-mapped files. When we need to send a stream of entries to a peer we should be able to just replay them from there, without the need to keeping anything at all on heap. The second part of the picture here is Akka Persistence protocol – which requires each object to be stored into a byte[] before anything happens to it. This absolutely sucks when combined with huge datastore writes and G1 GC's treatment of humongous objects (for which we already provide a workaround). What we really need is a streaming input/output interface, where on one side we have a Payload (say, a DataTree delta, in its native form) and on the other side we have a sequence of binary blocks (i.e. SegmentedJournal entries), where the translation between the two works in terms of DataInput/DataOutput (essentially), without intermediate byte[] or other buffers which would hold the entirety of the serialized form. So long story short, this issue is about ditching the use of AbstractPersistentActor and replace it with an explicit interface very much aligned to what atomix-storage provides, i.e. storage of bytes identified by logical numbers which we can read again if the need arises. |
| Comments |
| Comment by Robert Varga [ 02/Mar/23 ] |
|
The flow should something like:
Most notably ReplicatedLog contents are always backed by SegmentedJournal, i.e. purges of JournalSegmentedFile occur when the corresponding entries are trimmed and any access to their contents (for the purposes of sending AppendEntries) go through SegmentedJournalReader. |