[CONTROLLER-1698] node unreachable if odl under load Created: 25/May/17  Updated: 25/Jul/23  Resolved: 25/May/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Peter Gubka Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: File logs_616.tar.gz    
Issue Links:
Duplicate
duplicates CONTROLLER-1645 shard moved during 1M bgp prefix adve... Confirmed
External issue ID: 8550

 Description   

The bugs should to track the problem of unreachable nodes, which happenes often, especially if odl is under the load (e.g. bgp suite with 300k prefixes, or if a lot of transactions are produced on nodes).

The scenario in attached logs is simple.
1) produce-transactions rpc are started from all 3 nodes around time 11:15:02
2) On node1 around 11:15:12.xxx the node 3 is marked as unreachable.

Focusing on heartbeats the are sent regularly, but under the load all missing responses come all at once. E.g. on node3

2017-05-24 11:15:06,045 | DEBUG | lt-dispatcher-76 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat to [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:06,047 | DEBUG | lt-dispatcher-76 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat response from [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:07,055 | DEBUG | lt-dispatcher-47 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat to [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:08,055 | DEBUG | lt-dispatcher-34 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat to [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:09,056 | DEBUG | lt-dispatcher-77 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat to [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:11,977 | DEBUG | lt-dispatcher-79 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat to [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:11,978 | DEBUG | lt-dispatcher-37 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat to [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:12,045 | DEBUG | lt-dispatcher-54 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat to [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:13,056 | DEBUG | lt-dispatcher-79 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat to [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:15,419 | DEBUG | lt-dispatcher-78 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat to [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:15,428 | DEBUG | lt-dispatcher-83 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat to [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:16,055 | DEBUG | lt-dispatcher-75 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat to [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:17,055 | DEBUG | lt-dispatcher-78 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat to [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:17,992 | DEBUG | lt-dispatcher-20 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat response from [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:17,993 | DEBUG | lt-dispatcher-20 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat response from [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:18,003 | DEBUG | lt-dispatcher-43 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat response from [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:18,014 | DEBUG | ult-dispatcher-3 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat response from [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:18,014 | DEBUG | ult-dispatcher-3 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat response from [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:18,015 | DEBUG | ult-dispatcher-3 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat response from [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:18,020 | DEBUG | lt-dispatcher-43 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat response from [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:18,027 | DEBUG | lt-dispatcher-37 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat response from [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:18,028 | DEBUG | lt-dispatcher-78 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat response from [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]
2017-05-24 11:15:18,031 | DEBUG | lt-dispatcher-78 | ClusterHeartbeatSender | 201 - com.typesafe.akka.slf4j - 2.4.17 | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.6:2550] - Heartbeat response from [akka.tcp://opendaylight-cluster-data@172.17.0.4:2550]



 Comments   
Comment by Peter Gubka [ 25/May/17 ]

Attachment logs_616.tar.gz has been added with description: odl logs

Generated at Wed Feb 07 19:56:13 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.