[CONTROLLER-1572] ReadDataReply Message was too large can result in "Received UnreachableMember" in cluster Created: 27/Dec/16  Updated: 25/Jul/23  Resolved: 15/Jul/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: HeYunBo Assignee: Tom Pantelis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Microsoft Word Subchannel-component.doc    
External issue ID: 7449

 Description   

The serialization would be very time consuming if the ReadDataReply Message is very large, it will result in "Received UnreachableMember" in cluster

The default failure-detector will trigger if there are no heartbeats within 5.5s in akka cluster. "Received UnreachableMember" certainly occur if the serialization time exceeded 5.5s

2016-12-27 20:06:30,999 | WARN | t-dispatcher-199 | ClusterCoreDaemon | 179 - com.typesafe.akka.slf4j - 2.4.12 | Cluster Node [akka.tcp://opendaylight-cluster-data@10.46.60.132:2550] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://opendaylight-cluster-data@10.46.60.139:2550, status = Up)]. Node roles [member-1]
2016-12-27 20:06:31,002 | INFO | t-dispatcher-293 | ShardManager | 214 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Received UnreachableMember: memberName MemberName

{name=member-2}, address: akka.tcp://opendaylight-cluster-data@10.46.60.139:2550
2016-12-27 20:06:31,002 | INFO | t-dispatcher-296 | ShardManager | 214 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Received UnreachableMember: memberName MemberName{name=member-2}

, address: akka.tcp://opendaylight-cluster-data@10.46.60.139:2550
2016-12-27 20:06:31,002 | INFO | t-dispatcher-199 | EntityOwnershipShard | 209 - org.opendaylight.controller.sal-akka-raft - 1.5.0.SNAPSHOT | member-1-shard-entity-ownership-operational: onPeerDown: PeerDown [memberName=member-2, peerId=member-2-shard-entity-ownership-operational]
2016-12-27 20:06:31,362 | INFO | lt-dispatcher-33 | kka://opendaylight-cluster-data) | 179 - com.typesafe.akka.slf4j - 2.4.12 | Cluster Node [akka.tcp://opendaylight-cluster-data@10.46.60.132:2550] - Ignoring received gossip status from unreachable [UniqueAddress(akka.tcp://opendaylight-cluster-data@10.46.60.139:2550,2101232233)]
2016-12-27 20:06:32,362 | INFO | lt-dispatcher-86 | kka://opendaylight-cluster-data) | 179 - com.typesafe.akka.slf4j - 2.4.12 | Cluster Node [akka.tcp://opendaylight-cluster-data@10.46.60.132:2550] - Ignoring received gossip status from unreachable [UniqueAddress(akka.tcp://opendaylight-cluster-data@10.46.60.139:2550,2101232233)]
2016-12-27 20:06:32,979 | WARN | t-dispatcher-188 | Shard | 209 - org.opendaylight.controller.sal-akka-raft - 1.5.0.SNAPSHOT | member-1-shard-default-config: At least 1 followers need to be active, Switching member-1-shard-default-config from Leader to IsolatedLeader

------------------------------------------------------------------------------------

In addition, akka EndpointWriter will throw OversizedPayloadException if the ReadDataReply size over the maximumPayloadBytes

2016-12-27 20:06:41,571 | ERROR | lt-dispatcher-18 | EndpointWriter | 179 - com.typesafe.akka.slf4j - 2.4.12 | Transient association error (association remains live)
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Actor[akka.tcp://opendaylight-cluster-data@10.46.60.139:2550/temp/$J]: max allowed size 419430400 bytes, actual size of encoded class org.opendaylight.controller.cluster.datastore.messages.ReadDataReply was 523432856 bytes.



 Comments   
Comment by Tom Pantelis [ 27/Dec/16 ]

This will be alleviated when we switch to use akka Artery (https://git.opendaylight.org/gerrit/#/c/49466), which fragments large messages into smaller chunks (http://blog.akka.io/artery/2016/12/05/aeron-in-artery). Artery also has a dedicated sub channel for large messages that we can utilize for ReadDataReply messages.

Comment by HeYunBo [ 28/Dec/16 ]

I have switched to use akka Artery, but it still have the problem

2016-12-28 19:33:05,820 | ERROR | t-dispatcher-152 | Encoder | 179 - com.typesafe.akka.slf4j - 2.4.12 | Failed to serialize oversized message [org.opendaylight.controller.cluster.datastore.messages.ReadDataReply].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.46.60.139:25520/temp/$p]): max allowed size 262144 bytes. Message type [org.opendaylight.controller.cluster.datastore.messages.ReadDataReply].

Comment by Tom Pantelis [ 28/Dec/16 ]

The aeron layer is capable of fragmenting large messages but it seems akka still imposes an upper limit. For artery this appears to be maximum-frame-size
which is 256K by default (http://doc.akka.io/docs/akka/2.4/general/configuration.html#config-akka-remote-artery). We'll need to bump this up.

I haven't seen any way to get around this upper limit other than setting it really high. Perhaps you could engage the akka folks on this subject (mailing list or open an issue)?

Comment by HeYunBo [ 04/Jan/17 ]

I have consulted with akka about this question. They reply as follower:

-----------------------------------------------------------------------------

We recommend against sending large messages. Try to split them into smaller messages or send them via a side channel that is not using Akka remoting.

Note that Artery has some better support for large messages, but the recommendation is still valid.
http://doc.akka.io/docs/akka/2.4/scala/remoting-artery.html#Dedicated_subchannel_for_large_messages
You can increate the max allowed size, see reference.conf

-----------------------------------------------------------------------------

I wonder whether the ODL have discussed the plan to split a large message into smaller messages?

Comment by Tom Pantelis [ 04/Jan/17 ]

I don't agree with their view that messages should be split up/chunked at the app layer - this should be handled at the transport layer.

In any event, we can use the large message channel for FE <-> BE messages as discussed in CONTROLLER-1223 once Robert's CONTROLLER-1483 work is complete. I have set both the maximum-frame-size and maximum-large-frame-size to 1G with https://git.opendaylight.org/gerrit/#/c/49466/9/opendaylight/md-sal/sal-clustering-config/src/main/resources/initial/factory-akka.conf.

Ideally we would chunk large ReadDataReply messages or any other message containing NormalizedNodes that could be large - similar to the raft install snapshot chunking but generalized.

Comment by HeYunBo [ 20/Jan/17 ]

According to your advice in https://bugs.opendaylight.org/show_bug.cgi?id=2890, we are considering to implement a subchannel component which can be used for fragmentation and defragmentation. Please refer to the attachment.

Comment by HeYunBo [ 20/Jan/17 ]

Attachment Subchannel-component.doc has been added with description: subchannel component for fragmentation and defragmentation

Comment by Tom Pantelis [ 22/Jun/17 ]

Message slicing/re-assembly patch: https://git.opendaylight.org/gerrit/#/c/55767/

Read reply slicing patch: https://git.opendaylight.org/gerrit/#/q/topic:bug/7449

Generated at Wed Feb 07 19:55:53 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.