[CONTROLLER-1862] Remote connection to [null] failed with java.net.ConnectException: Connection refused: /10.30.170.99:2550 Created: 17/Sep/18 Updated: 19/Sep/18 |
|
| Status: | Open |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Medium |
| Reporter: | Sam Hague | Assignee: | Sam Hague |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
During 3node testing using graceful start and stop the following exception is seen. Graceful start and stop means using the bin/stop and start commands to stop and start ODL rather than using kill -9. This means there is an orderly stop to the bundles where each bundle is destroyed. The flow is all three ODLs are up. Then shutdown ODL1 via bin/stop. No exceptions like below. Bring back ODL1 via bin/start. wait for cluster to sync. Then take down ODL2 via bin/stop. The exception below repeats until much after ODL2 is restarted. It makes sense that the connection refused comes out since the ODL2 is down. What doesn't make sense is why this didn't happen when ODL1 was taken down. The second issue is why after ODL2 was brought back, why didn't the exceptions stop as soon as the sync completed. They continued for a little while after sync was finished. 2018-09-17T16:56:18,081 | WARN | opendaylight-cluster-data-akka.actor.default-dispatcher-34 | ClusterCoreDaemon | 41 - com.typesafe.akka.slf4j - 2.5.11 | Cluster Node [akka.tcp://opendaylight-cluster-data@10.30.170.121:2550] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://opendaylight-cluster-data@10.30.170.99:2550, status = Up)]. Node roles [member-1, dc-default] 2018-09-17T16:56:19,195 | WARN | opendaylight-cluster-data-akka.actor.default-dispatcher-35 | NettyTransport | 41 - com.typesafe.akka.slf4j - 2.5.11 | Remote connection to [null] failed with java.net.ConnectException: Connection refused: /10.30.170.99:2550 2018-09-17T16:56:19,198 | WARN | opendaylight-cluster-data-akka.actor.default-dispatcher-35 | ReliableDeliverySupervisor | 41 - com.typesafe.akka.slf4j - 2.5.11 | Association with remote system [akka.tcp://opendaylight-cluster-data@10.30.170.99:2550] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://opendaylight-cluster-data@10.30.170.99:2550]] Caused by: [Connection refused: /10.30.170.99:2550] 2018-09-17T16:56:25,191 | WARN | opendaylight-cluster-data-akka.actor.default-dispatcher-2 | NettyTransport | 41 - com.typesafe.akka.slf4j - 2.5.11 | Remote connection to [null] failed with java.net.ConnectException: Connection refused: /10.30.170.99:2550 |
| Comments |
| Comment by Tom Pantelis [ 17/Sep/18 ] |
|
> The flow is all three ODLs are up. Then shutdown ODL1 via bin/stop. No exceptions like below. Bring back ODL1 via bin/start. wait > for cluster to sync. Then take down ODL2 via bin/stop. The exception below repeats until much after ODL2 is restarted. > It makes sense that the connection refused comes out since the ODL2 is down. What doesn't make sense is why this didn't happen when ODL1 was taken down. What nodes's log were you looking at? When ODL1 is taken down, you won't see such messages in its log - you would see them in the other logs. > The second issue is why after ODL2 was brought back, why didn't the exceptions stop as soon as the sync completed. They continued for a little while after sync was finished. What sync are you referring to? The "Connection refused" messages stop as soon as akka establishes connection to the node. If they continued then connection hadn't been established yet. Was there an actual failure scenario here that needs to be investigated here or are you just wondering about the messages?
|