[NETCONF-574] No status for rogue device in callhome server API Created: 09/Oct/18  Updated: 02/Jun/20

Status: In Progress
Project: netconf
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Medium
Reporter: Luis Gomez Assignee: Manoj Chokka
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

When a rogue device (it is not added in whitelist) tries to connect to controller (callhome server), it fails to connect (OK) but the device status "FAILED_NOT_ALLOWED" does not show in the callhome server API (NOK):

GET /restconf/operational/odl-netconf-callhome-server:netconf-callhome-server
404
{"errors":{"error":[{"error-type":"application","error-tag":"data-missing","error-message":"Request could not be completed because the relevant data model content does not exist "}]}}

In the karaf log it can be seen:

2018-10-09T09:34:13,471 | INFO  | remote-connector-processing-executor-11 | NetconfDevice                    | 287 - org.opendaylight.netconf.sal-netconf-connector - 1.8.1.SNAPSHOT | RemoteDevice{netopeer}: Netconf connector initialized successfully
2018-10-09T09:34:13,474 | WARN  | opendaylight-cluster-data-notification-dispatcher-48 | CallhomeStatusReporter           | 268 - org.opendaylight.netconf.callhome-provider - 1.5.1.SNAPSHOT | No corresponding callhome device found - exiting.

So maybe this is expected behavior.

In any case the failing test has been commented in this patch:
https://git.opendaylight.org/gerrit/#/c/76807



 Comments   
Comment by Jakub Morvay [ 23/Oct/18 ]

I've just quickly ran trough the implementation and it seems that the device should be reported as FAILED_NOT_ALLOWED.

Need to investigate it further why it is not reported that way.

Comment by Manoj Chokka [ 13/Jan/20 ]

Update on the analysis done so far,

  • The code does not have any issues.
  • During manual testing we were not able to reproduce the issue. We were able to see the message FAILED_NOT_ALLOWED in the device status for rouge devices.
  • Upon examination of the callhome.robot test case, we found the "apt-get update" is taking more time than expected. But the wait time for the test case is only 30s. This is the reason the test case is failing.
  • A simple timeout change would do the trick.

 

Comment by Luis Gomez [ 28/Jan/20 ]

I will fix this in the CSIT.

Comment by Manoj Chokka [ 28/May/20 ]

Hi Luis,

Found a better fix for the testcase.

Looks like the entries in sources.list in the docker are working a few times and failing the other times.

Found the actual sources list at https://wiki.debian.org/LTS/Using

deb http://deb.debian.org/debian/ jessie main contrib non-free

deb-src http://deb.debian.org/debian/ jessie main contrib non-free

deb http://security.debian.org/ jessie/updates main contrib non-free

deb-src http://security.debian.org/ jessie/updates main contrib non-free

 

To reflect these changes,

  • create sources.list file in ~/integration-test/csit/variables/netconf/callhome
  • add the above sources to the file.
  • add a volume in docker-compose.yaml file,
    • -   /home/chokkma/integration-test/csit/variables/netconf/callhome/sources.list:/etc/apt/sources.list
  • the above volume command will copy the new source.list to the /etc/apt/sources.list

And we also need to change the timeout in to 300s in callhome.robot, to allow 'apt-get update' and 'apt-get install curl' to finish successfully.

 

If you approve these changes, I can make the changes and push them.

Please let me know.

 

BR

Manoj

 

Comment by Jamo Luhrsen [ 02/Jun/20 ]

cmanoj, per the conversation we had in the kernel call today is the "fix" for this issue
going to all happen on the test side of things? If so, we can move this ticket to that project.
let me know and I can take care of that.

Generated at Wed Feb 07 20:15:22 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.