[CONTROLLER-1512] sal-akka-raft: Use akka.dispatch.ControlMessage for messages Created: 18/Apr/16  Updated: 25/Jul/23  Resolved: 05/Jun/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Robert Varga Assignee: Tom Pantelis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Blocks
blocks CONTROLLER-1513 sal-akka-raft: separate Shard and Raf... Resolved
blocks CONTROLLER-1675 Leadership transfer failed: Follower ... Confirmed
External issue ID: 5740

 Description   

RaftActor implements time-sensitive messaging exchange, but it is also subclassed to provide subclass operations.

When the subclass is targeted with a lot of messages, this can lead to timeouts in the RAFT underlay – hence potentially disrupting stability.

Akka provides a way to divide classes into two:

  • make important messages implement akka.dispatch.ControlMessage
  • use one of ControlAwareMailboxes

ControlAwareMailboxes use two internal queues, giving priority to delivering messages which are instanceof ControlMessage and delivering other messages only when no ControlMessage is present.

This way we can ensure that the RAFT algorithm operates isolated from any floods incurred by the implementation above, leading to better stability.

Performance impact and interactions with persistence need to be evaluated before this becomes our default.



 Comments   
Comment by Muthukumaran Kothandaraman [ 21/Apr/16 ]

HI Robert,

This implies that the order would be altered when compared to current order of delivery to Shard.

Would there be any changes in functional behavior due to this alteration of order? For instance, transaction-related messages (non-RAFT) of shard get reordered with replication related (RAFT) messages

Regards
Muthu

Comment by Robert Varga [ 21/Apr/16 ]

The ordering does not really matter, as the messages are inherently coming from two unrelated sources: applications and RAFT peers.

As discussed on this week's call, it actually makes sense to make RaftActor a completely separate actor, with Shard being just talking to it using messages.

There is not an issue for that, but if Ashutosh does not open one by Friday, I will open it myself (and make this issue blocked by that one).

Even when we separate the two actors, the interchange is logically split into two classes:

RaftActor <-> RaftActor: ControlMessages, since RAFT convergence is the top priority:

Shard -> RaftActor: normal message, just a request to persist a Payload

RaftActor -> Shard: ControlMessages, as these propagate role (Leader/Follower) and we do not want to reorder w.r.t. responses to Payload persistence

  • -> Shard: normal messages, as these are work items which generate code and therefore need to be subject to backpressure when RAFT is in flux and cannot keep up. Most notably this will favor delivering Payload response messages before client requests, because without them the system is not making forward progres.
Comment by Tom Pantelis [ 23/May/17 ]

Currently we use the SingleConsumerOnlyUnboundedMailbox for shards which is a "high-performance, multiple producer—single consumer, unbounded MailboxType". But it also says "Currently this queue is slower for some benchmarks than the ConcurrentLinkedQueue from JDK 8 that is used by default, so be sure to measure the performance in your particular setting in order to determine which one to use."

To switch to a ControlAwareMailbox, it looks like we'd go with the UnboundedControlAwareMailbox (I don't see any other option other than BoundedControlAwareMailbox). Looking at the scala code for UnboundedControlAwareMailbox.MessageQueue:

object UnboundedControlAwareMailbox {
class MessageQueue extends UnboundedControlAwareMessageQueueSemantics with java.io.Serializable

{ val controlQueue: Queue[Envelope] = new ConcurrentLinkedQueue[Envelope]() val queue: Queue[Envelope] = new ConcurrentLinkedQueue[Envelope]() }

}

it uses ConcurrentLinkedQueue for both queues. There's slight overhead to check if the Envelope is a ControlMessage but that looks negligible.

We can add the configuration with a knob in the akka.conf to select which mailbox to use for shards.

Comment by Tom Pantelis [ 24/May/17 ]

Patches: https://git.opendaylight.org/gerrit/#/q/status:open+project:controller+branch:master+topic:bug/5740

Comment by Robert Varga [ 05/Jun/17 ]

This should help CONTROLLER-1675 too.

Generated at Wed Feb 07 19:55:44 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.