[CONTROLLER-1645] shard moved during 1M bgp prefix advertizing (with tell-based=true) Created: 27/Apr/17 Updated: 25/Jul/23 |
|
| Status: | Confirmed |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Peter Gubka | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Issue Links: |
|
||||||||||||||||
| External issue ID: | 8318 | ||||||||||||||||
| Description |
|
https://jenkins.opendaylight.org/sandbox/job/bgpcep-csit-3node-periodic-bgpclustering-only-carbon/8/ After the bgp peer is connected, 1M prefixes are being advertized. From the log.html we see that updates for 1M prefixes were sent between 017-04-27 07:36:16,151 INFO BGP-Dummy-1: Common single UPDATE will be generated for both NLRI & WITHDRAWN lists During this time there was a shard leader leader movement Finally only 26k prefixes are present in odl. |
| Comments |
| Comment by Vratko Polak [ 27/Apr/17 ] |
|
> https://jenkins.opendaylight.org/sandbox/job/bgpcep-csit-3node-periodic-bgpclustering-only-carbon/8/ Copied a segment of member-1 karaf.log up to when it marked member-3 unreachable. Note there are few troubling INFOs and WARNs before BGP peer has even connected. 2017-04-27 07:34:34,620 | INFO | config-pusher | ConfigPusherImpl | 126 - org.opendaylight.controller.config-persister-impl - 0.6.0.SNAPSHOT | Pushing configuration snapshot 10-rest-connector.xml(odl-restconf,odl-restconf) 2017-04-27 07:34:34,733 | INFO | lt-dispatcher-40 | ClientActorBehavior | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.SNAPSHOT | member-1-frontend-datastore-operational: resolving connection ConnectingClientConnection{client=ClientIdentifier {frontend=member-1-frontend-datastore-operational, generation=0}, cookie=0} to ConnectedClientConnection{client=ClientIdentifier{frontend=member-1-frontend-datastore-operational, generation=0}, cookie=0, backend=ShardBackendInfo{actor=Actorakka://opendaylight-cluster-data/user/shardmanager-operational/member-1-shard-default-operational#2033146177, sessionId=1, version=BORON, maxMessages=1000, cookie=0, shard=default}} , cookie=0, backend=ShardBackendInfo{actor=Actorakka://opendaylight-cluster-data/user/shardmanager-operational/member-1-shard-default-operational#2033146177, sessionId=1, version=BORON, maxMessages=1000, cookie=0, shard=default}} 2017-04-27 07:36:14,219 | INFO | lt-dispatcher-21 | ClientActorBehavior | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.SNAPSHOT | member-1-frontend-datastore-operational: resolving connection ConnectingClientConnection{client=ClientIdentifier {frontend=member-1-frontend-datastore-operational, generation=0}, cookie=2} to ConnectedClientConnection{client=ClientIdentifier{frontend=member-1-frontend-datastore-operational, generation=0}, cookie=2, backend=ShardBackendInfo{actor=Actorakka://opendaylight-cluster-data/user/shardmanager-operational/member-1-shard-entity-ownership-operational#-171190699, sessionId=2, version=BORON, maxMessages=1000, cookie=2, shard=entity-ownership}} , cookie=2, backend=ShardBackendInfo{actor=Actorakka://opendaylight-cluster-data/user/shardmanager-operational/member-1-shard-entity-ownership-operational#-171190699, sessionId=2, version=BORON, maxMessages=1000, cookie=2, shard=entity-ownership}} 2017-04-27 07:36:15,078 | INFO | h for user karaf | command | 320 - org.apache.karaf.log.command - 3.0.8 | ROBOT MESSAGE: Starting test Check_For_Empty_Ipv4_Topology_Before_Talking_3 2017-04-27 07:36:15,426 | INFO | h for user karaf | command | 320 - org.apache.karaf.log.command - 3.0.8 | ROBOT MESSAGE: Starting test Reconfigure_ODL_To_Accept_Connection 2017-04-27 07:36:16,422 | INFO | on-dispatcher-50 | BgpPeer | 271 - org.opendaylight.bgpcep.bgp-rib-impl - 0.7.0.SNAPSHOT | Peer Singleton Service example-bgp-rib-service-group registered 2017-04-27 07:36:23,929 | INFO | h for user karaf | command | 320 - org.apache.karaf.log.command - 3.0.8 | ROBOT MESSAGE: Starting test Wait_For_Stable_Talking_Ipv4_Topology_1 |
| Comment by Vratko Polak [ 27/Apr/17 ] |
|
Member-3 log is basically the same. 2017-04-27 07:34:34,121 | INFO | config-pusher | ConfigPusherImpl | 126 - org.opendaylight.controller.config-persister-impl - 0.6.0.SNAPSHOT | Pushing configuration snapshot 10-rest-connector.xml(odl-restconf,odl-restconf) 2017-04-27 07:34:34,363 | INFO | lt-dispatcher-19 | ClientActorBehavior | 197 - org.opendaylight.controller.cds-access-client - 1.1.0.SNAPSHOT | member-2-frontend-datastore-operational: resolving connection ConnectingClientConnection{client=ClientIdentifier {frontend=member-2-frontend-datastore-operational, generation=0}, cookie=0} to ConnectedClientConnection{client=ClientIdentifier{frontend=member-2-frontend-datastore-operational, generation=0}, cookie=0, backend=ShardBackendInfo{actor=Actor[akka.tcp://opendaylight-cluster-data@10.29.13.130:2550/user/shardmanager-operational/member-1-shard-default-operational#2033146177], sessionId=1, version=BORON, maxMessages=1000, cookie=0, shard=default}} , cookie=0, backend=ShardBackendInfo{actor=Actor[akka.tcp://opendaylight-cluster-data@10.29.13.130:2550/user/shardmanager-operational/member-1-shard-default-operational#2033146177], sessionId=1, version=BORON, maxMessages=1000, cookie=0, shard=default}} 2017-04-27 07:34:34,680 | INFO | config-pusher | ConfigPusherImpl | 126 - org.opendaylight.controller.config-persister-impl - 0.6.0.SNAPSHOT | Successfully pushed configuration snapshot 10-rest-connector.xml(odl-restconf,odl-restconf) 2017-04-27 07:34:41,039 | WARN | lt-dispatcher-17 | FrontendReadOnlyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Rejecting unsupported request ModifyTransactionRequest {target=member-3-datastore-config-fe-0-txn-12-0, sequence=1, replyTo=Actor[akka.tcp://opendaylight-cluster-data@10.29.14.105:2550/user/$a#-1001685412], operations=[], protocol=ABORT}2017-04-27 07:35:47,914 | INFO | 4]-nio2-thread-1 | ServerSession | 30 - org.apache.sshd.core - 0.14.0 | Server session created from /10.29.4.11:35174 2017-04-27 07:36:11,518 | WARN | lt-dispatcher-34 | FrontendReadOnlyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Rejecting unsupported request ModifyTransactionRequest {target=member-3-datastore-config-fe-0-txn-13-0, sequence=1, replyTo=Actor[akka.tcp://opendaylight-cluster-data@10.29.14.105:2550/user/$a#-1001685412], operations=[], protocol=ABORT}2017-04-27 07:36:11,675 | WARN | lt-dispatcher-34 | FrontendReadOnlyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Rejecting unsupported request ModifyTransactionRequest {target=member-1-datastore-config-fe-0-txn-16-0, sequence=1, replyTo=Actor[akka.tcp://opendaylight-cluster-data@10.29.13.130:2550/user/$a#699258237], operations=[], protocol=ABORT}2017-04-27 07:36:11,974 | WARN | lt-dispatcher-17 | FrontendReadOnlyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Rejecting unsupported request ModifyTransactionRequest {target=member-3-datastore-config-fe-0-txn-14-0, sequence=1, replyTo=Actor[akka.tcp://opendaylight-cluster-data@10.29.14.105:2550/user/$a#-1001685412], operations=[], protocol=ABORT}2017-04-27 07:36:12,141 | WARN | lt-dispatcher-17 | FrontendReadOnlyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Rejecting unsupported request ModifyTransactionRequest {target=member-1-datastore-config-fe-0-txn-17-0, sequence=1, replyTo=Actor[akka.tcp://opendaylight-cluster-data@10.29.13.130:2550/user/$a#699258237], operations=[], protocol=ABORT}2017-04-27 07:36:12,435 | WARN | lt-dispatcher-17 | FrontendReadOnlyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Rejecting unsupported request ModifyTransactionRequest {target=member-3-datastore-config-fe-0-txn-15-0, sequence=1, replyTo=Actor[akka.tcp://opendaylight-cluster-data@10.29.14.105:2550/user/$a#-1001685412], operations=[], protocol=ABORT}2017-04-27 07:36:12,595 | WARN | lt-dispatcher-17 | FrontendReadOnlyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Rejecting unsupported request ModifyTransactionRequest {target=member-1-datastore-config-fe-0-txn-18-0, sequence=1, replyTo=Actor[akka.tcp://opendaylight-cluster-data@10.29.13.130:2550/user/$a#699258237], operations=[], protocol=ABORT}2017-04-27 07:36:12,929 | WARN | lt-dispatcher-16 | FrontendReadOnlyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Rejecting unsupported request ModifyTransactionRequest {target=member-3-datastore-config-fe-0-txn-16-0, sequence=1, replyTo=Actor[akka.tcp://opendaylight-cluster-data@10.29.14.105:2550/user/$a#-1001685412], operations=[], protocol=ABORT}2017-04-27 07:36:13,090 | WARN | lt-dispatcher-34 | FrontendReadOnlyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Rejecting unsupported request ModifyTransactionRequest {target=member-1-datastore-config-fe-0-txn-19-0, sequence=1, replyTo=Actor[akka.tcp://opendaylight-cluster-data@10.29.13.130:2550/user/$a#699258237], operations=[], protocol=ABORT}2017-04-27 07:36:13,380 | WARN | lt-dispatcher-14 | FrontendReadOnlyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Rejecting unsupported request ModifyTransactionRequest {target=member-3-datastore-config-fe-0-txn-17-0, sequence=1, replyTo=Actor[akka.tcp://opendaylight-cluster-data@10.29.14.105:2550/user/$a#-1001685412], operations=[], protocol=ABORT}2017-04-27 07:36:13,552 | WARN | lt-dispatcher-14 | FrontendReadOnlyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Rejecting unsupported request ModifyTransactionRequest {target=member-1-datastore-config-fe-0-txn-20-0, sequence=1, replyTo=Actor[akka.tcp://opendaylight-cluster-data@10.29.13.130:2550/user/$a#699258237], operations=[], protocol=ABORT}2017-04-27 07:36:13,818 | WARN | lt-dispatcher-14 | FrontendReadOnlyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Rejecting unsupported request ModifyTransactionRequest {target=member-3-datastore-config-fe-0-txn-18-0, sequence=1, replyTo=Actor[akka.tcp://opendaylight-cluster-data@10.29.14.105:2550/user/$a#-1001685412], operations=[], protocol=ABORT}2017-04-27 07:36:14,053 | INFO | h for user karaf | command | 320 - org.apache.karaf.log.command - 3.0.8 | ROBOT MESSAGE: Starting test Get Example Bgp Rib Owner 2017-04-27 07:36:14,455 | INFO | h for user karaf | command | 320 - org.apache.karaf.log.command - 3.0.8 | ROBOT MESSAGE: Starting test Check_For_Empty_Ipv4_Topology_Before_Talking_1 2017-04-27 07:36:14,810 | INFO | h for user karaf | command | 320 - org.apache.karaf.log.command - 3.0.8 | ROBOT MESSAGE: Starting test Check_For_Empty_Ipv4_Topology_Before_Talking_2 2017-04-27 07:36:15,507 | INFO | h for user karaf | command | 320 - org.apache.karaf.log.command - 3.0.8 | ROBOT MESSAGE: Starting test Reconfigure_ODL_To_Accept_Connection 2017-04-27 07:36:24,714 | WARN | lt-dispatcher-19 | FrontendReadOnlyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Rejecting unsupported request ModifyTransactionRequest {target=member-3-datastore-config-fe-0-txn-20-0, sequence=1, replyTo=Actor[akka.tcp://opendaylight-cluster-data@10.29.14.105:2550/user/$a#-1001685412], operations=[], protocol=ABORT}2017-04-27 07:36:24,876 | WARN | lt-dispatcher-17 | Shard | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-2-shard-default-operational (Follower): Missing index 357 from log. Cannot apply state. Ignoring 357 to 360 2017-04-27 07:36:25,450 | WARN | lt-dispatcher-42 | FrontendReadOnlyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Rejecting unsupported request ModifyTransactionRequest {target=member-3-datastore-config-fe-0-txn-21-0, sequence=1, replyTo=Actor[akka.tcp://opendaylight-cluster-data@10.29.14.105:2550/user/$a#-1001685412], operations=[], protocol=ABORT}2017-04-27 07:36:30,335 | WARN | lt-dispatcher-42 | Shard | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-2-shard-default-operational (Follower): Missing index 364 from log. Cannot apply state. Ignoring 364 to 366 2017-04-27 07:36:30,618 | WARN | lt-dispatcher-44 | Shard | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-2-shard-default-operational (Follower): Missing index 367 from log. Cannot apply state. Ignoring 367 to 372 2017-04-27 07:36:36,172 | WARN | lt-dispatcher-43 | Shard | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-2-shard-default-operational (Follower): Missing index 368 from log. Cannot apply state. Ignoring 368 to 376 2017-04-27 07:36:39,197 | WARN | lt-dispatcher-16 | FrontendReadOnlyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Rejecting unsupported request ModifyTransactionRequest {target=member-3-datastore-config-fe-0-txn-23-0, sequence=1, replyTo=Actor[akka.tcp://opendaylight-cluster-data@10.29.14.105:2550/user/$a#-1001685412], operations=[], protocol=ABORT}2017-04-27 07:36:42,572 | WARN | lt-dispatcher-35 | Shard | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-2-shard-default-operational (Follower): Missing index 369 from log. Cannot apply state. Ignoring 369 to 380 2017-04-27 07:36:46,345 | WARN | lt-dispatcher-17 | FrontendReadOnlyTransaction | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Rejecting unsupported request ModifyTransactionRequest {target=member-3-datastore-config-fe-0-txn-24-0, sequence=1, replyTo=Actor[akka.tcp://opendaylight-cluster-data@10.29.14.105:2550/user/$a#-1001685412], operations=[], protocol=ABORT}2017-04-27 07:36:54,182 | INFO | lt-dispatcher-37 | ShardManager | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Received UnreachableMember: memberName MemberName {name=member-1}, address: akka.tcp://opendaylight-cluster-data@10.29.13.130:2550 |
| Comment by Vratko Polak [ 27/Apr/17 ] |
|
> bgpcep-csit-3node-periodic-bgpclustering-only-carbon/8/ |
| Comment by Vratko Polak [ 11/May/17 ] |
|
This is still happening, also with the scale reduced to 300k prefixes. See logs at [2] few seconds after 17:13:00. Member-1 and member-3 mark member-2 as unreachable without an apparent reason, subsequent leader movements even result in "relevant data model content does not exist" [3] response. [2] https://logs.opendaylight.org/releng/jenkins092/bgpcep-csit-3node-periodic-bgpclustering-only-carbon/264/archives/ |
| Comment by Tom Pantelis [ 11/May/17 ] |
|
Akka cluster will mark nodes as unreachable b/c: 1) Actual network connectivity has been lost For #2, you should enable GC logging and check for long pauses and then possibly tweak JVM memory/GC settings if needed. For #3, well... not sure there's much we can really do about that right now, except maybe more throttling of transactions, until we can enable artery which promises to alleviate such problems. (In reply to Vratko Polák from comment #4) |
| Comment by Robert Varga [ 12/May/17 ] |
|
Given the symptoms, I think we are seeing 3) except it is not a lot of messages but rather a few large messages. With an unthrottled BGP peer tell-based protocol can pick 1M prefixes in about 12 seconds (as opposed to ~30 seconds) and it aggressively reuses transactions, i.e. the entire work is done in ~60 transactions rather than ~40K transactions. This is caused by a much straighter path in the local transaction case – hence the balance between the time to commit a transaction and the time to jam a modification into it is shifted. This results in AppendEntries messages which are huge and end up monopolizing the channel – and that is what causes akka to go crazy. The long-term solution is to go for artery with fragmentation. For short term we have essentially two options:
|
| Comment by Robert Varga [ 12/May/17 ] |
|
http://doc.akka.io/docs/akka/2.4/java/cluster-usage.html#Failure_Detector is something we should be able to tune for the BGP case. |
| Comment by Tom Pantelis [ 12/May/17 ] |
|
I'm working on the re-usable fragmentation now so we can switch to artery. When sending AppendEntries, we already limit the total payload entries size to the snapshot chunk size of 2M. However if a single payload size exceeds the chunk size then we send it as is b/c we don't currently have a mechanism to slice/chunk an AppendEntries. I assume we're seeing large single CommitTransactionPayloads. The maximum-frame-size we set currently for akka messages is ~400M so we're not exceeding that (otherwise we'd see serialization failures). |
| Comment by Vratko Polak [ 18/May/17 ] |
|
Just writing one more symptom visible to Robot [5] when this Bug happens: ]}} This is expected if the example-ipv4-topology governed by a Singleton service which is currently moving between members (having deleted the whole topology for the time being). |
| Comment by Vratko Polak [ 23/May/17 ] |
|
Just adding a symptom visible from Robot [6]: |
| Comment by Ajay L [ 15/Sep/17 ] |
|
(In reply to Tom Pantelis from comment #8) Tom, is this fix available to test yet? Thx |
| Comment by Tom Pantelis [ 15/Sep/17 ] |
|
(In reply to Ajay L from comment #12) yes - the AppendEntries chunking has been implemented. |
| Comment by Ajay L [ 19/Sep/17 ] |
|
(In reply to Tom Pantelis from comment #13) Thx Tom. We ran the BGP 300k prefix test on master and it still fails for the scenario where shard leader and rib entity owner are on different nodes Is the AppendEntries chunking expected to directly help with the late heartbeats, or is it something that was done as a pre-requisite for artery support, and the benefits are supposed to kick-in only when artery is used? (which seems to be disabled currently bcoz of https://bugs.opendaylight.org/show_bug.cgi?id=7518) Increasing akka.cluster.failure-detector.acceptable-heartbeat-pause value did produce better results for above test. Do you think this is a reasonable adjustment for this scenario? Guess the downside is that it will have adverse impact on detecting genuine node down cases |
| Comment by Tom Pantelis [ 19/Sep/17 ] |
|
You need to enable the tell-based protocol (the use-tell-based-protocol setting) and see what the results are. Then try with artery enabled. You'll have to fiddle with the akka maximum-frame-size and buffer-pool-size settings and our maximum-message-slice-size setting. The latter will have to be some cushion (maybe like 10K) less than maximum-frame-size. |
| Comment by Ajay L [ 19/Sep/17 ] |
|
The above test already uses tell-based mode. Without that, the numbers are much worse. So tell-based definitely helps. We will tinker with the other settings and share the findings |