[CONTROLLER-2037] Fail to serialize oversized message Created: 05/Apr/22 Updated: 26/May/22 Resolved: 26/May/22 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | 5.0.1, 5.0.2, 5.0.3 |
| Fix Version/s: | 6.0.0, 5.0.5 |
| Type: | Bug | Priority: | Medium |
| Reporter: | Sangwook Ha | Assignee: | Robert Varga |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Description |
|
Found in openflowplugin-csit-3node-clustering-bulkomatic-only-sulfur/190. Serialization of messages larger than the maximum allowed (524288 bytes) fails and later causes circuit breaker to time out: 2022-04-04T15:18:46,456 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-28 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.171.35:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].
2022-04-04T15:18:46,979 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-41 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.171.35:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].
2022-04-04T15:18:47,314 | INFO | ForkJoinPool-9-worker-3 | FlowReader | 287 - org.opendaylight.openflowplugin.applications.bulk-o-matic - 0.14.0 | Total Flows read: 10000
2022-04-04T15:18:47,500 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-42 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.171.35:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].
2022-04-04T15:18:48,023 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-40 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.171.35:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].
2022-04-04T15:18:48,233 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-10 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.171.35:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].
2022-04-04T15:18:48,539 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-10 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.171.35:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].
2022-04-04T15:18:49,063 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-6 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.171.35:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].
2022-04-04T15:18:49,579 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-11 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.171.35:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].
2022-04-04T15:18:50,099 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-10 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].
akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.171.35:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].
2022-04-04T15:18:50,257 | INFO | opendaylight-cluster-data-shard-dispatcher-38 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config: Initiating snapshot capture CaptureSnapshot [lastAppliedIndex=279998, lastAppliedTerm=5, lastIndex=279999, lastTerm=5, installSnapshotInitiated=, replicatedToAllIndex=-1, replicatedToAllTerm=-1, unAppliedEntries size=319, mandatoryTrim=false]
2022-04-04T15:18:50,259 | INFO | opendaylight-cluster-data-shard-dispatcher-33 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config: Persisting of snapshot done: Snapshot [lastIndex=279999, lastTerm=5, lastAppliedIndex=279998, lastAppliedTerm=5, unAppliedEntries size=319, state=org.opendaylight.controller.cluster.datastore.persisted.ShardSnapshotState@9b0c2c5, electionTerm=5, electionVotedFor=member-2-shard-inventory-config, ServerConfigPayload=null]
2022-04-04T15:18:50,259 | INFO | opendaylight-cluster-data-shard-dispatcher-33 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config: Removed in-memory snapshotted entries, adjusted snaphsotIndex: 279998 and term: 5
2022-04-04T15:18:50,311 | INFO | opendaylight-cluster-data-shard-dispatcher-27 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config: SaveSnapshotSuccess received for snapshot, sequenceNr: 389120
2022-04-04T15:18:50,607 | INFO | opendaylight-cluster-data-shard-dispatcher-35 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config (Leader): Initiating install snapshot to follower member-1-shard-inventory-config: follower nextIndex: 240676, leader snapshotIndex: 279998, leader lastIndex: 281505, leader log size: 1507
2022-04-04T15:18:50,607 | INFO | opendaylight-cluster-data-shard-dispatcher-35 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config: Initiating snapshot capture CaptureSnapshot [lastAppliedIndex=281504, lastAppliedTerm=5, lastIndex=281505, lastTerm=5, installSnapshotInitiated=, replicatedToAllIndex=-1, replicatedToAllTerm=-1, unAppliedEntries size=1, mandatoryTrim=false] to install on member-1-shard-inventory-config
2022-04-04T15:18:50,643 | INFO | opendaylight-cluster-data-shard-dispatcher-35 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config: Persisting of snapshot done: Snapshot [lastIndex=281505, lastTerm=5, lastAppliedIndex=281504, lastAppliedTerm=5, unAppliedEntries size=1, state=org.opendaylight.controller.cluster.datastore.persisted.ShardSnapshotState@4add6906, electionTerm=5, electionVotedFor=member-2-shard-inventory-config, ServerConfigPayload=null]
2022-04-04T15:18:50,644 | INFO | opendaylight-cluster-data-shard-dispatcher-35 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config: Removed in-memory snapshotted entries, adjusted snaphsotIndex: 279998 and term: 5
2022-04-04T15:18:50,680 | INFO | opendaylight-cluster-data-shard-dispatcher-35 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config: SaveSnapshotSuccess received for snapshot, sequenceNr: 392649
2022-04-04T15:18:50,824 | INFO | opendaylight-cluster-data-shard-dispatcher-29 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | member-2-shard-inventory-config (Leader): Snapshot successfully installed on follower member-1-shard-inventory-config (last chunk 2) - matchIndex set to 281504, nextIndex set to 281505
2022-04-04T15:19:00,327 | WARN | opendaylight-cluster-data-akka.actor.default-dispatcher-40 | Shard | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to deleteMessages toSequenceNr [389120] for persistenceId [member-2-shard-inventory-config] due to [null: Circuit Breaker Timed out.].
2022-04-04T15:19:00,329 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-28 | Shard | 206 - org.opendaylight.controller.repackaged-akka - 5.0.1 | Failed to persist event type [org.opendaylight.controller.cluster.raft.persisted.ApplyJournalEntries] with sequence number [392650] for persistenceId [member-2-shard-inventory-config].
akka.pattern.CircuitBreaker$$anon$13: Circuit Breaker Timed out.
2022-04-04T15:19:00,335 | INFO | opendaylight-cluster-data-shard-dispatcher-36 | Shard | 211 - org.opendaylight.controller.sal-clustering-commons - 5.0.1 | Stopping Shard member-2-shard-inventory-config
2022-04-04T15:19:33,600 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-39 | ClientActorBehavior | 199 - org.opendaylight.controller.cds-access-client - 5.0.1 | member-2-frontend-datastore-config: connection ConnectedClientConnection{client=ClientIdentifier{frontend=member-2-frontend-datastore-config, generation=1}, cookie=1, backend=ShardBackendInfo{actor=Actor[akka://opendaylight-cluster-data/user/shardmanager-config/member-2-shard-inventory-config#190559700], sessionId=2, version=SODIUM_SR1, maxMessages=1000, cookie=1, shard=inventory, dataTree=present}} reconnecting as ReconnectingClientConnection{client=ClientIdentifier{frontend=member-2-frontend-datastore-config, generation=1}, cookie=1, backend=ShardBackendInfo{actor=Actor[akka://opendaylight-cluster-data/user/shardmanager-config/member-2-shard-inventory-config#190559700], sessionId=2, version=SODIUM_SR1, maxMessages=1000, cookie=1, shard=inventory, dataTree=present}}
2022-04-04T15:19:33,600 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-39 | ClientActorBehavior | 199 - org.opendaylight.controller.cds-access-client - 5.0.1 | member-2-frontend-datastore-config: refreshing backend for shard 1
2022-04-04T15:19:33,854 | INFO | ForkJoinPool-9-worker-3 | AbstractClientConnection | 199 - org.opendaylight.controller.cds-access-client - 5.0.1 | Capping ReconnectingClientConnection{client=ClientIdentifier{frontend=member-2-frontend-datastore-config, generation=1}, cookie=1, backend=ShardBackendInfo{actor=Actor[akka://opendaylight-cluster-data/user/shardmanager-config/member-2-shard-inventory-config#190559700], sessionId=2, version=SODIUM_SR1, maxMessages=1000, cookie=1, shard=inventory, dataTree=present}} throttle delay from 47 to 5 seconds
java.lang.Throwable: null
at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.sendEntry(AbstractClientConnection.java:262) ~[?:?]
at org.opendaylight.controller.cluster.access.client.SimpleReconnectForwarder.forwardEntry(SimpleReconnectForwarder.java:18) ~[?:?]
at org.opendaylight.controller.cluster.access.client.TransmitQueue.enqueueOrForward(TransmitQueue.java:266) ~[?:?]
at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.enqueueOrForward(AbstractClientConnection.java:183) ~[?:?]
at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.sendEntry(AbstractClientConnection.java:257) ~[?:?]
at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.sendRequest(AbstractClientConnection.java:159) ~[?:?]
at org.opendaylight.controller.cluster.databroker.actors.dds.ProxyHistory.sendRequest(ProxyHistory.java:570) ~[?:?]
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction.sendRequest(AbstractProxyTransaction.java:328) ~[?:?]
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction.abort(AbstractProxyTransaction.java:446) ~[?:?]
at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4772) [?:?]
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractClientHandle.commonAbort(AbstractClientHandle.java:84) [bundleFile:?]
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractClientHandle.abort(AbstractClientHandle.java:70) [bundleFile:?]
at org.opendaylight.controller.cluster.databroker.ClientBackedTransaction.close(ClientBackedTransaction.java:62) [bundleFile:?]
at org.opendaylight.controller.cluster.databroker.ClientBackedReadTransaction.close(ClientBackedReadTransaction.java:52) [bundleFile:?]
at org.opendaylight.controller.cluster.databroker.AbstractDOMBrokerTransaction.closeSubtransactions(AbstractDOMBrokerTransaction.java:87) [bundleFile:?]
at org.opendaylight.controller.cluster.databroker.DOMBrokerReadOnlyTransaction.close(DOMBrokerReadOnlyTransaction.java:47) [bundleFile:?]
at org.opendaylight.mdsal.binding.dom.adapter.BindingDOMReadTransactionAdapter.close(BindingDOMReadTransactionAdapter.java:46) [bundleFile:?]
at org.opendaylight.openflowplugin.applications.bulk.o.matic.FlowReader.readFlowsX(FlowReader.java:101) [bundleFile:?]
at org.opendaylight.openflowplugin.applications.bulk.o.matic.FlowReader.run(FlowReader.java:67) [bundleFile:?]
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426) [?:?]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) [?:?]
at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) [?:?]
at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) [?:?]
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) [?:?]
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) [?:?]
2022-04-04T15:19:38,617 | WARN | ForkJoinPool.commonPool-worker-3 | AbstractShardBackendResolver | 213 - org.opendaylight.controller.sal-distributed-datastore - 5.0.1 | Failed to resolve shard
java.util.concurrent.TimeoutException: Connection attempt failed
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.wrap(AbstractShardBackendResolver.java:151) ~[?:?]
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.onConnectResponse(AbstractShardBackendResolver.java:168) ~[?:?]
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.lambda$connectShard$4(AbstractShardBackendResolver.java:161) ~[?:?]
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) [?:?]
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) [?:?]
at java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479) [?:?]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) [?:?]
at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) [?:?]
at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) [?:?]
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) [?:?]
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) [?:?]
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-config/member-2-shard-inventory-config#190559700)]] after [5000 ms]. Message of type [org.opendaylight.controller.cluster.access.commands.ConnectClientRequest]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
2022-04-04T15:19:38,857 | INFO | ForkJoinPool-9-worker-3 | AbstractClientConnection | 199 - org.opendaylight.controller.cds-access-client - 5.0.1 | Capping ReconnectingClientConnection{client=ClientIdentifier{frontend=member-2-frontend-datastore-config, generation=1}, cookie=1, backend=ShardBackendInfo{actor=Actor[akka://opendaylight-cluster-data/user/shardmanager-config/member-2-shard-inventory-config#190559700], sessionId=2, version=SODIUM_SR1, maxMessages=1000, cookie=1, shard=inventory, dataTree=present}} throttle delay from 43 to 5 seconds
java.lang.Throwable: null
at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.sendEntry(AbstractClientConnection.java:262) ~[?:?]
at org.opendaylight.controller.cluster.access.client.SimpleReconnectForwarder.forwardEntry(SimpleReconnectForwarder.java:18) ~[?:?]
at org.opendaylight.controller.cluster.access.client.TransmitQueue.enqueueOrForward(TransmitQueue.java:266) ~[?:?]
at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.enqueueOrForward(AbstractClientConnection.java:183) ~[?:?]
at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.sendEntry(AbstractClientConnection.java:257) ~[?:?]
at org.opendaylight.controller.cluster.access.client.AbstractClientConnection.sendRequest(AbstractClientConnection.java:159) ~[?:?]
at org.opendaylight.controller.cluster.databroker.actors.dds.ProxyHistory.sendRequest(ProxyHistory.java:570) ~[?:?]
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction.sendRequest(AbstractProxyTransaction.java:328) ~[?:?]
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction.abort(AbstractProxyTransaction.java:446) ~[?:?]
at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4772) [?:?]
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractClientHandle.commonAbort(AbstractClientHandle.java:84) [bundleFile:?]
at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractClientHandle.abort(AbstractClientHandle.java:70) [bundleFile:?]
at org.opendaylight.controller.cluster.databroker.ClientBackedTransaction.close(ClientBackedTransaction.java:62) [bundleFile:?]
at org.opendaylight.controller.cluster.databroker.ClientBackedReadTransaction.close(ClientBackedReadTransaction.java:52) [bundleFile:?]
at org.opendaylight.controller.cluster.databroker.AbstractDOMBrokerTransaction.closeSubtransactions(AbstractDOMBrokerTransaction.java:87) [bundleFile:?]
at org.opendaylight.controller.cluster.databroker.DOMBrokerReadOnlyTransaction.close(DOMBrokerReadOnlyTransaction.java:47) [bundleFile:?]
at org.opendaylight.mdsal.binding.dom.adapter.BindingDOMReadTransactionAdapter.close(BindingDOMReadTransactionAdapter.java:46) [bundleFile:?]
at org.opendaylight.openflowplugin.applications.bulk.o.matic.FlowReader.readFlowsX(FlowReader.java:101) [bundleFile:?]
at org.opendaylight.openflowplugin.applications.bulk.o.matic.FlowReader.run(FlowReader.java:67) [bundleFile:?]
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426) [?:?]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) [?:?]
at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) [?:?]
at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) [?:?]
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) [?:?]
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) [?:?]
|
| Comments |
| Comment by Sangwook Ha [ 27/Apr/22 ] |
|
This is still found with v5.0.3: 2022-04-27T05:54:07,395 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-19 | Encoder | 206 - org.opendaylight.controller.repackaged-akka - 5.0.2 | Failed to serialize oversized message [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. akka.remote.OversizedPayloadException: Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.170.139:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)]. 2022-04-27T05:54:07,411 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-19 | RemoteActorRef | 206 - org.opendaylight.controller.repackaged-akka - 5.0.2 | Message [org.opendaylight.controller.cluster.raft.messages.AppendEntries] wrapped in [akka.actor.ActorSelectionMessage] from Actor[akka://opendaylight-cluster-data/user/shardmanager-config/member-1-shard-inventory-config#878229052] to Actor[akka://opendaylight-cluster-data@10.30.170.139:2550/] was dropped. Discarding oversized payload sent to Some(Actor[akka://opendaylight-cluster-data@10.30.170.139:2550/]): max allowed size 524288 bytes. Message type [ActorSelectionMessage(org.opendaylight.controller.cluster.raft.messages.AppendEntries)].. [9] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. |
| Comment by Robert Varga [ 11/May/22 ] |
|
So this is just AppendEntries, other messages seem to be okay. |
| Comment by Robert Varga [ 11/May/22 ] |
|
The circuit breaker seems to be an environment problem, as tracked in CONTROLLER-2043 |
| Comment by Robert Varga [ 12/May/22 ] |
|
This seems to be problem with accounting for the serialized size. The current tuning is that we have 512KiB for buffers and select up to 480KiB in payloads – i.e. leave out 32KiB for wrapping and other serialization overheads. Now that might not be enough as the format is generally this:
out.writeShort(appendEntries.leaderRaftVersion);
out.writeLong(appendEntries.getTerm());
out.writeObject(appendEntries.leaderId);
out.writeLong(appendEntries.prevLogTerm);
out.writeLong(appendEntries.prevLogIndex);
out.writeLong(appendEntries.leaderCommit);
out.writeLong(appendEntries.replicatedToAllIndex);
out.writeShort(appendEntries.payloadVersion);
out.writeInt(appendEntries.entries.size());
for (ReplicatedLogEntry e: appendEntries.entries) {
out.writeLong(e.getIndex());
out.writeLong(e.getTerm());
out.writeObject(e.getData());
}
out.writeObject(appendEntries.leaderAddress);
Now the number of entries we select is dominated by
final int maxDataSize = context.getConfigParams().getSnapshotChunkSize();
and AbstractReplicatedLogImpl uses ReplicatedLogEntry.size() to count against that limit. That does not include the 16 bytes spent on each entry for term/index, nor does it include the encapsulation overhead of writeObject() – hence we can bite off a larger number of entries than we can actually serialize into the 512KiB buffer. The second part is that Payload.size() does not say anything about its semantics. Looking at implementations, there is actually a major disconnect between them:
|
| Comment by Robert Varga [ 13/May/22 ] |
|
So investigating this a bit more, the size() here is meant to be an estimate of heap residency, not the persistent size – which does not have any expression at all. |