[INFRAUTILS-33] Expose ready and/or diagstatus via a non-authenticated URL Created: 11/Apr/18 Updated: 22/Aug/18 Resolved: 02/Jul/18 |
|
| Status: | Resolved |
| Project: | infrautils |
| Component/s: | diagstatus |
| Affects Version/s: | None |
| Fix Version/s: | Oxygen-SR3, Fluorine |
| Type: | Improvement | Priority: | Medium |
| Reporter: | Michael Vorburger | Assignee: | Michael Vorburger |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Description |
|
https://bugzilla.redhat.com/show_bug.cgi?id=1549218 and related internal discussion have revealed that it would be useful for the TripleO installation orchestrator to health check a non-authenticated URL for ready and/or diagstatus status. They cannot use that jolokia JMX HTTP bridge URL which requires authentication at the point where they need to make that check. It should not be very hard to write a Servlet which exposes similar information than the diag CLI command, with a HTTP status code (like 200 vs 503) and the body containing output similar to that CLI command. This will not break the existing diagstatus JMX bean exposed via (authenticated) /jolokia, but will be in addition to that. |
| Comments |
| Comment by Michael Vorburger [ 16/Apr/18 ] |
|
jluhrsen, trozet, JankiChhatbar (and FYI k.faseela) I've started looking into, and want to clarify 2 points before code:
Are both OK for all of you? |
| Comment by Jamo Luhrsen [ 16/Apr/18 ] |
yeah, json response is fine. The healthcheck might/could just be as dumb as sending a curl to this endpoint and marking things
I'm not totally following what you are getting at here. I think this healthcheck endpoint is on a container by container basis |
| Comment by Faseela K [ 17/Apr/18 ] |
|
Agree to vorburger, the CLI is just an example implementation of a client to show how status can be retrieved from all nodes. Orchestrator will definitely know about all 3 nodes in a cluster, and hence should be easy for the orchestrator to fetch the status from all nodes by constructing the REST URLs, specific to the node IPs. |
| Comment by Janki Chhatbar [ 17/Apr/18 ] |
|
Healthcheck just curls on specified URL and says container is healthy if the output of command is 0. This is run from inside the container. So each ODL container in a cluster will check for health for itself. Hence no need to worry about the cluster setup. They won't even know whether they are in cluster. |
| Comment by Muthukumaran Kothandaraman [ 17/Apr/18 ] |
|
A minor clarification. Does orchestrator use a static address list and and curls on the URL ? When one of cluster node is down during, I assume it would be treated as a timeout on curl for the corresponding node if the list is static. |
| Comment by Jamo Luhrsen [ 17/Apr/18 ] |
|
@everyone, I think we should ignore anything to do with clustering in this jira @janki, I don't think healthcheck has to be a simple curl, only. We can do whatever crazy bash-fu we want to do. So, in the end we could do curl to get the full response |
| Comment by Michael Vorburger [ 20/Apr/18 ] |
|
Completed implementation, see 3 linked Gerrit reviews (c/70987, c/71168 and c/71172). With this, the odl-infrautils-diagstatus feature will expose this on http://localhost:8181/diagstatus/ : {
"timeStamp": "Fri Apr 20 16:41:26 CEST 2018",
"isOperational": true,
"systemReadyState": "ACTIVE",
"statusSummary": []
}
in infrautils/common/karaf with no other features, or e.g. with odl-netvirt-openstack like this: {
"timeStamp": "Fri Apr 20 17:09:33 CEST 2018",
"isOperational": true,
"systemReadyState": "ACTIVE",
"statusSummary": [
{
"serviceName": "OPENFLOW",
"effectiveStatus": "OPERATIONAL",
"reportedStatusDescription": "switch connections started",
"statusTimestamp": "2018-04-20T15:08:29.713Z"
},
{
"serviceName": "IFM",
"effectiveStatus": "OPERATIONAL",
"reportedStatusDescription": "Service started",
"statusTimestamp": "2018-04-20T15:08:17.268Z"
},
{
"serviceName": "ITM",
"effectiveStatus": "OPERATIONAL",
"reportedStatusDescription": "Service started",
"statusTimestamp": "2018-04-20T15:08:19.617Z"
},
{
"serviceName": "ELAN",
"effectiveStatus": "OPERATIONAL",
"reportedStatusDescription": "Service started",
"statusTimestamp": "2018-04-20T15:08:20.191Z"
},
{
"serviceName": "DATASTORE",
"effectiveStatus": "OPERATIONAL",
"reportedStatusDescription": "OPERATIONAL",
"statusTimestamp": "2018-04-20T15:09:33.509Z"
}
]
}
BTW jluhrsen idea in FYI http://localhost:8181/diagstatus/ will return the following HTTP status codes:
HTH. |
| Comment by Michael Vorburger [ 25/Apr/18 ] |
|
This is all completely finished, done and dusted now - from my side. |
| Comment by Michael Vorburger [ 31/May/18 ] |
|
jluhrsen and thapar have internally suggested that this would be good to have not only on master for Fluorine but also on stable/oxygen already - I'm therefore reopening this issue and look into doing the back-port, hopefully some time next week. |
| Comment by Michael Vorburger [ 01/Jun/18 ] |
|
jluhrsen is now using jolokia/exec/org.opendaylight.infrautils.diagstatus/... instead of /diagstatus, so my understanding is there is no immediate need / request aanymore to back-port this to stable/oxygen amymore after all, so closing it again. |
| Comment by Michael Vorburger [ 20/Jun/18 ] |
|
Re-opening this for backporting to Oxygen SR3, so that |