[CONTROLLER-2010] Get Entities RPC fails after owner is killed Created: 02/Nov/21  Updated: 14/Nov/21  Resolved: 14/Nov/21

Status: Resolved
Project: controller
Component/s: eos
Affects Version/s: 4.0.5
Fix Version/s: 4.0.7

Type: Bug Priority: Medium
Reporter: Sangwook Ha Assignee: Tomas Cere
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Get Entities RPC call returns "RPC failed" after the owner of entity is killed.

To reproduce:
1. Set up 3-node cluster
2. Install OpenFlow flow service (odl-openflowplugin-flow-services-rest)
3. Get the list of entities with get-entities RPC
4. Kill the owner: e.g. on the host of the owner, run kill -9 <CONTROLLER_PID>
5. Get the list of entities from one of the two instances still running: returns "RPC failed" with status code of 500

Example of the response of get-entities in step 2:

POST /rests/operations/odl-entity-owners:get-entities
{
    "odl-entity-owners:output": {
        "entities": [
            {
                "type": "org.opendaylight.mdsal.AsyncServiceCloseEntityType",
                "name": "ofp-topology-manager",
                "candidate-nodes": [
                    "member-1"
                ],
                "owner-node": "member-1"
            },
            {
                "type": "org.opendaylight.mdsal.ServiceEntityType",
                "name": "ofp-topology-manager",
                "candidate-nodes": [
                    "member-1",
                    "member-2",
                    "member-3"
                ],
                "owner-node": "member-1"
            }
        ]
    }
}

Response of get-entities in step 5 after killing 'member-1':

POST /rests/operations/odl-entity-owners:get-entities
{
    "errors": {
        "error": [
            {
                "error-tag": "operation-failed",
                "error-info": "<severity>error</severity>",
                "error-message": "RPC failed",
                "error-type": "rpc"
            }
        ]
    }
}


 Comments   
Comment by Sangwook Ha [ 02/Nov/21 ]

This causes failure of the tests case "Check Entity Owner Status And Find Owner and Successor After Fail" in 3-node clustering CSIT in openflowplugin-csit-3node-clustering-only-phosphorus:

https://s3-logs.opendaylight.org/logs/releng/vex-yul-odl-jenkins-1/openflowplugin-csit-3node-clustering-only-phosphorus/206/robot-plugin/log.html.gz

Generated at Wed Feb 07 19:56:59 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.