[INFRAUTILS-38] showSvcStatus: Status retrieval JMX Operation failed for node 127.0.0.1 due to rmi.ConnectException: Connection refused to host: 127.0.0.1 Created: 02/May/18  Updated: 22/May/18  Resolved: 22/May/18

Status: Verified
Project: infrautils
Component/s: diagstatus
Affects Version/s: None
Fix Version/s: Fluorine

Type: Bug Priority: Medium
Reporter: Michael Vorburger Assignee: Faseela K
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

jluhrsen is seeing this on particular environment (but not everywhere) :

We initially had some confusion because 2 separate logs were submitted. This bug with the new log has nothing to do with INFRAUTILS-36 (part of INFRAUTILS-31) nor is it related to the technical enhancement proposed in INFRAUTILS-37.

I suspect this could perhaps be related to that environment having a particular networking config.

2018-05-02T14:01:50,905 | ERROR | pipe-showSvcStatus | DiagStatusCommand | 233 - org.opendaylight.infrautils.diagstatus-shell - 1.4.0.SNAPSHOT | Exception while reaching Host 127.0.0.1
java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: 
java.net.ConnectException: Connection refused (Connection refused)]
at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:369) [?:?]
at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270) [?:?]
at org.opendaylight.infrautils.diagstatus.MBeanUtils.invokeRemoteJMXOperation(MBeanUtils.java:160) [231:org.opendaylight.infrautils.diagstatus-api:1.4.0.SNAPSHOT]
at org.opendaylight.infrautils.diagstatus.shell.DiagStatusCommand.getRemoteStatusSummary(DiagStatusCommand.java:74) [233:org.opendaylight.infrautils.diagstatus-shell:1.4.0.SNAPSHOT]
at org.opendaylight.infrautils.diagstatus.shell.DiagStatusCommand.execute(DiagStatusCommand.java:47) [233:org.opendaylight.infrautils.diagstatus-shell:1.4.0.SNAPSHOT]
at org.apache.karaf.shell.commands.basic.AbstractCommand.execute(AbstractCommand.java:34) [49:org.apache.karaf.shell.core:4.1.5]
at Proxybdb3a43e_422e_473e_8dcb_4c63b2f0ef41.execute(Unknown Source) [?:?]
at org.apache.karaf.shell.compat.CommandTracker.execute(CommandTracker.java:120) [49:org.apache.karaf.shell.core:4.1.5]
at org.apache.karaf.shell.impl.console.osgi.secured.SecuredCommand.execute(SecuredCommand.java:68) [49:org.apache.karaf.shell.core:4.1.5]
at org.apache.karaf.shell.impl.console.osgi.secured.SecuredCommand.execute(SecuredCommand.java:86) [49:org.apache.karaf.shell.core:4.1.5]
at org.apache.felix.gogo.runtime.Closure.executeCmd(Closure.java:571) [49:org.apache.karaf.shell.core:4.1.5]
at org.apache.felix.gogo.runtime.Closure.executeStatement(Closure.java:497) [49:org.apache.karaf.shell.core:4.1.5]
at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:386) [49:org.apache.karaf.shell.core:4.1.5]
at org.apache.felix.gogo.runtime.Pipe.doCall(Pipe.java:417) [49:org.apache.karaf.shell.core:4.1.5]
at org.apache.felix.gogo.runtime.Pipe.call(Pipe.java:229) [49:org.apache.karaf.shell.core:4.1.5]
at org.apache.felix.gogo.runtime.Pipe.call(Pipe.java:59) [49:org.apache.karaf.shell.core:4.1.5]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:?]
at java.util.concurrent.ThreadPoolExecutor.run(ThreadPoolExecutor.java:624) [?:?]
at java.lang.Thread.run(Thread.java:748) [?:?]
Caused by: javax.naming.ServiceUnavailableException
at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:136) ~[?:?]
at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:205) ~[?:?]
at javax.naming.InitialContext.lookup(InitialContext.java:417) ~[?:?]
at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1955) ~[?:?]
at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1922) ~[?:?]
at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:287) ~[?:?]
... 19 more
Caused by: java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: 
java.net.ConnectException: Connection refused (Connection refused)
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619) ~[?:?]
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216) ~[?:?]
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202) ~[?:?]
at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:338) ~[?:?]
at sun.rmi.registry.RegistryImpl_Stub.lookup(RegistryImpl_Stub.java:112) ~[?:?]
at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:132) ~[?:?]
at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:205) ~[?:?]
at javax.naming.InitialContext.lookup(InitialContext.java:417) ~[?:?]
at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1955) ~[?:?]
at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1922) ~[?:?]
at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:287) ~[?:?]
... 19 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:?]
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[?:?]
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[?:?]
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[?:?]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:?]
at java.net.Socket.connect(Socket.java:589) ~[?:?]
at java.net.Socket.connect(Socket.java:538) ~[?:?]
at java.net.Socket.<init>(Socket.java:434) ~[?:?]
at java.net.Socket.<init>(Socket.java:211) ~[?:?]
at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40) ~[?:?]
at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:148) ~[?:?]
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613) ~[?:?]
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216) ~[?:?]
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202) ~[?:?]
at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:338) ~[?:?]
at sun.rmi.registry.RegistryImpl_Stub.lookup(RegistryImpl_Stub.java:112) ~[?:?]
at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:132) ~[?:?]
at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:205) ~[?:?]
at javax.naming.InitialContext.lookup(InitialContext.java:417) ~[?:?]
at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1955) ~[?:?]
at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1922) ~[?:?]
at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:287) ~[?:?]
... 19 more


 Comments   
Comment by Michael Vorburger [ 02/May/18 ]

k.faseela do you want to assign this to yourself, add some logging, and follow up with jluhrsen, or were you hoping I would?

Comment by Faseela K [ 02/May/18 ]

jluhrsen : Do you hit the issue in a script, or when you run this manually? Self-address is derived from an akka based configuration, wondering whether your script is executing showSvcStatus CLI too early, before this is exposed. And I am wondering whether DNS is not configured properly on your system, that it results in the error.

 

Else, if you are running this manually giving enough time, could you please give me the output of :

http://<ODL-IP>:8181/jolokia/read/akka:type=Cluster

 

Comment by Jamo Luhrsen [ 02/May/18 ]

ok, so I have learned more about this one. INFRAUTILS-37 and this are not the same root cause.

TL;DR the /etc/hosts file matters, and stuff internally with odl/jmx/etc will choke if the $(hostname) is not there.

This issue is something that used to show up for me in the karaf console log and I could never figure out what
was wrong, but did not try very hard as it didn't seem to bother anything I was doing. I don't have the old error
I'd see on the console, but it was jmx related and errors connecting to 127.0.0.1

So, the fix for me is to have a proper /etc/hosts file. It was this:

127.0.0.1   localhost localhost.localdomain jamo-rc701.rdocloud
::1         localhost localhost.localdomain jamo-rc701.rdocloud

but, after changing it to this:

127.0.0.1   localhost localhost.localdomain
::1         localhost localhost.localdomain

it works.

This begs the question of what is going on under the hood, because the error says that it's a connection
refused to 127.0.0.1, because I am able to open a connection to 127.0.0.1 on port 6886 which is the
relevant port here. see below with 6886 working and 6887 being refused like we'd expect:

$ nc 127.0.0.1 6886
^C
]$ nc 127.0.0.1 6887
Ncat: Connection refused.

I can't think of how this matters yet, but we can chalk this up to a user configuration issue, but
that was pretty lame to hunt down.

Comment by Faseela K [ 03/May/18 ]

Jamo, JMX RMI uses InetAddress.getHostByName() inside, and if /etc/hosts does not have the hostname, it does not work.

But, what is still not clear to me is, why we go via JMX RMI for 127.0.0.1 which is local address.

Output of this will help me to analyse that further : http://<ODL-IP>:8181/jolokia/read/akka:type=Cluster

 

Comment by Jamo Luhrsen [ 03/May/18 ]
$ curl -u "admin:admin" http://localhost:8181/jolokia/read/akka:type=Cluster | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   651    0   651    0     0  83482      0 --:--:-- --:--:-- --:--:-- 93000
{
    "request": {
        "mbean": "akka:type=Cluster",
        "type": "read"
    },
    "status": 200,
    "timestamp": 1525358717,
    "value": {
        "Available": true,
        "ClusterStatus": "{\n  \"members\": [\n    {\n      \"address\": \"akka.tcp://opendaylight-cluster-data@127.0.0.1:2550\",\n      \"roles\": [\n        \"dc-default\",\n        \"member-1\"\n      ],\n      \"status\": \"Up\"\n    }\n  ],\n  \"self-address\": \"akka.tcp://opendaylight-cluster-data@127.0.0.1:2550\",\n  \"unreachable\": []\n}\n",
        "Leader": "akka.tcp://opendaylight-cluster-data@127.0.0.1:2550",
        "MemberStatus": "Up",
        "Members": "akka.tcp://opendaylight-cluster-data@127.0.0.1:2550",
        "Singleton": true,
        "Unreachable": ""
    }
}

So, what are you looking for in that output?

Comment by Faseela K [ 04/May/18 ]

in this output, if self-address matches the ip in the cluster-members, we do localStatusSummary(). I was wondering why the exception you have given shows that RemoteStatusSummary() failed for 127.0.0.1.

 

Could you please confirm if you are using a build which has this fix?

https://git.opendaylight.org/gerrit/#/c/70842/

 

Comment by Jamo Luhrsen [ 04/May/18 ]

I think it was, but it was from 4/17, which is when that patch was merged. Anyway,
below is the same request on a distro built today. looks like the same output:

$ curl -u "admin:admin" http://localhost:8181/jolokia/read/akka:type=Cluster | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   651    0   651    0     0   5482      0 --:--:-- --:--:-- --:--:--  5516
{
    "request": {
        "mbean": "akka:type=Cluster",
        "type": "read"
    },
    "status": 200,
    "timestamp": 1525458172,
    "value": {
        "Available": true,
        "ClusterStatus": "\{\n  \"members\": [\n    {\n      \"address\": \"akka.tcp://opendaylight-cluster-data@127.0.0.1:2550\",\n      \"roles\": [\n        \"dc-default\",\n        \"member-1\"\n      ],\n      \"status\": \"Up\"\n    }\n  ],\n  \"self-address\": \"akka.tcp://opendaylight-cluster-data@127.0.0.1:2550\",\n  \"unreachable\": []\n}\n",
        "Leader": "akka.tcp://opendaylight-cluster-data@127.0.0.1:2550",
        "MemberStatus": "Up",
        "Members": "akka.tcp://opendaylight-cluster-data@127.0.0.1:2550",
        "Singleton": true,
        "Unreachable": ""
    }
}
Comment by Jamo Luhrsen [ 09/May/18 ]

k.faseela, anything more I can do here? I don't think this is probably a high priority/severity bug, but just making sure the

ball is not in my court

Comment by Faseela K [ 09/May/18 ]

jluhrsen only thing is whether you had the patch I mentioned above while testing? If it is there the error would not have come even if your /etc/hosts is not proper for single node . Anyways should we close the bug saying it is a missing config?

Comment by Jamo Luhrsen [ 09/May/18 ]

k.faseela, no I confirmed that I had the patch and gave the output in my last comment. Do you want me to try again?

Comment by Faseela K [ 10/May/18 ]

Yes, I made sathwiksai to try this out locally, and we always see that for 127.0.0.1 it fetches the local status and never go via RMI. In fact that was the main reason why the above patch was pushed. Yesterday, even Vishal tried this out, and we did not see the error. Probably we should try removing hostname from /etc/hosts and see if it works? Will try that out and update.

Comment by sathwik boggarapu [ 10/May/18 ]

I tried this locally, removing the hostname in etc/hosts and it worked. I have seen the same before this patch (https://git.opendaylight.org/gerrit/#/c/70842/) pushed. that's the main reason the patch was pushed.

Comment by Jamo Luhrsen [ 22/May/18 ]

just for kicks I double checked with and without a good /etc/hosts file and I don't see the high level symptoms with those
exceptions. closing this as resolved and verified.

Generated at Wed Feb 07 20:02:06 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.