[INTTEST-100] cluster_rest_script.py is aborting Created: 09/Jun/20 Updated: 04/Sep/20 Resolved: 04/Sep/20 |
|
| Status: | Resolved |
| Project: | integration-test |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Highest |
| Reporter: | Jamo Luhrsen | Assignee: | Robert Varga |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Description |
|
Something is newly broken with a test tool for controller jobs, like the command line being used is this: python cluster_rest_script.py - -host 10.30.170.86 --port 8181 add --itemtype car --itemcount 10000 --ipr 10000 but it's dying with something like this:
Traceback (most recent call last):
File "cluster_rest_script.py", line 7, in <module>
the controller jobs using this tool are failing because of it (obviously), but those |
| Comments |
| Comment by Robert Varga [ 12/Jun/20 ] |
|
This is a failure across all release trains. Mg detected it between
|
| Comment by Robert Varga [ 12/Jun/20 ] |
|
The most obvious difference seems to be: @@ -281,2 +282,2 @@ - 'vm_1_image': 'ZZCI - Ubuntu 16.04 - mininet-ovs-28 - ' - '20190415-091034.881'}, + 'vm_1_image': 'ZZCI - Ubuntu 16.04 - mininet-ovs-28 - x86_64 - ' + '20200601-220226.013'}, I am not sure whether this is actually the image ODL runs on |
| Comment by Robert Varga [ 12/Jun/20 ] |
|
This seems to be pointing at https://git.opendaylight.org/gerrit/c/releng/builder/+/90196/2/jjb/integration/integration-templates.yaml
|
| Comment by Thanh Ha (zxiiro) [ 12/Jun/20 ] |
|
Ran a job that directly launched "ZZCI - Ubuntu 16.04 - mininet-ovs-28 - x86_64 - 20200601-220226.013" and it appears to work (traceback is due to connection to an IP address that doesn't exist so ignore that). I'm not seeing failure to import the requests library here. So what this tells me is the base image might be okay. Perhaps something that's happening in CSIT to prepare the image to run the job results in broken dependencies or similar? python test/tools/odl-mdsal-clustering-tests/scripts/cluster_rest_script.py --host 10.30.170.86 --port 8181 add --itemtype car --itemcount 10000 --ipr 10000
/home/jenkins/.local/lib/python2.7/site-packages/requests/__init__.py:83: RequestsDependencyWarning: Old version of cryptography ([1, 2, 3]) may cause slowdown.
warnings.warn(warning, RequestsDependencyWarning)
2020-06-12 19:13:55,948 INFO: Add 10000 car(s) to 10.30.170.86:8181 (10000 per request)
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "test/tools/odl-mdsal-clustering-tests/scripts/cluster_rest_script.py", line 221, in _request_sender
rsp = ses.send(prep, timeout=req_timeout)
File "/home/jenkins/.local/lib/python2.7/site-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/home/jenkins/.local/lib/python2.7/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='10.30.170.86', port=8181): Max retries exceeded with url: /restconf/config/car:cars (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7faf0a4e4750>: Failed to establish a new connection: [Errno 111] Connection refused',))
|
| Comment by Thanh Ha (zxiiro) [ 12/Jun/20 ] |
|
I'm not sure how to read the CSIT robot logs but if someone can get the complete traceback from the description that might be useful. What's pasted in the description isn't enough for me to see the full picture of what is failing. |
| Comment by Jamo Luhrsen [ 12/Jun/20 ] |
|
That's not the same trouble as this: Traceback (most recent call last):
File "cluster_rest_script.py", line 7, in <module>
where is the link to the job where you got the above traceback. That probably means the |
| Comment by Thanh Ha (zxiiro) [ 12/Jun/20 ] |
|
I tried adding '-L TRACE' to the command but unfortunately didn't get much more details: 18:06:38.893 TRACE Arguments: [ 'Traceback (most recent call last):\r\n File "cluster_rest_script.py", line 7, in <module>' ] 18:06:38.893 INFO Traceback (most recent call last): File "cluster_rest_script.py", line 7, in <module> 18:06:38.893 TRACE Return: None |
| Comment by Thanh Ha (zxiiro) [ 12/Jun/20 ] |
|
Tried to get debug logs by passing --debugfile to robot and the resulting robot-debug.log.gz This is a tough one since the only thing we can infer from failing on "line 7" is that something is causing requests library to fail to import but why exactly is hard to say without more information. |
| Comment by Jamo Luhrsen [ 13/Jun/20 ] |
|
do you have the .html robot file and/or the output.xml file? |
| Comment by Robert Varga [ 14/Jun/20 ] |
|
Same thing is happening to https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/bgpcep-csit-1node-throughpcep-all-aluminium/99/robot-plugin/log.html.gz
|
| Comment by Thanh Ha (zxiiro) [ 14/Jun/20 ] |
|
jluhrsen can you confirm to me which system the python scripts run on? is it the robot VM or the mininet VM? If I boot a mininet vm directly and clone integration/test repo and attempt to run the mdsal python script I get: Traceback (most recent call last):
File "tools/odl-mdsal-clustering-tests/scripts/cluster_rest_script.py", line 7, in <module>
import requests
ImportError: No module named requests
https://jenkins.opendaylight.org/sandbox/view/All/job/zxiiro-test/7/console Which is pretty similar to the CSIT job run. Thinking about the mininet VMs changing recently, could this be a case of the old VM having robot tools like requests pre-installed vs the new mininet VM just plain missing it? or is CSIT supposed to install requests at runtime? I'm not sure if the old mininet is still available but if it is maybe we can get the LF to boot one and do a pip freeze on it. |
| Comment by Luis Gomez [ 14/Jun/20 ] |
|
This is very simple problem: since the issue shows when running python scripts in the TOOLS_SYSTEM VM,
The below packages are missing in the new image but present in the old one: chardet (2.3.0) So until we figure how to fix the image or the robot test to install the missing modules, I think it is better to revert the originating patch: https://git.opendaylight.org/gerrit/c/releng/builder/+/90270 |
| Comment by Luis Gomez [ 14/Jun/20 ] |
|
The issue shows in the mininet/TOOLS_SYSTEM VM. It looks like new image is missing some modules including requests (see my comment below). |
| Comment by Robert Varga [ 15/Jun/20 ] |
|
https://git.opendaylight.org/gerrit/c/releng/builder/+/90446 should result it these being installed: git-review-1.25.0-2 python-chardet-2.3.0-2 python-urllib3-1.13.1-2ubuntu0.16.04.3 python-requests-2.9.1-3ubuntu0.1 python-ndg-httpsclient-0.4.0-3 |
| Comment by Robert Varga [ 15/Jun/20 ] |
[15/06/2020 16:56:50] <rovarga> zxiiro: did we switch provisioning stuff when we went to vexxhost? [15/06/2020 16:57:16] <rovarga> zxiiro: builder/packer/provision/baseline.sh has: [15/06/2020 16:57:18] <rovarga> ensure_ubuntu_install git-review [15/06/2020 16:57:28] <zxiiro> rovarga: bash scripts to ansible. I can't remember if we were using packer in both places (I assume so) [15/06/2020 16:57:45] <rovarga> ooooh [15/06/2020 16:57:52] <rovarga> okay, so that's how we lost these packages [15/06/2020 16:58:33] <rovarga> note that baseline.sh is not referenced anywhere except gbp.sh [15/06/2020 16:58:44] <zxiiro> Vexxhost change wasn't 2019 though. [15/06/2020 16:59:14] <zxiiro> So we would have discovered issues earlier than that no? [15/06/2020 17:00:05] <rovarga> zxiiro: are you sure? [15/06/2020 17:00:11] <zxiiro> Although I did not spend much time managing mininet VMs possible those VMs still used the bash provisioning scripts maybe? [15/06/2020 17:00:36] <zxiiro> rovarga: yes we switched to Vexxhost earlier than 2019. I can't remember the exact year. maybe tykeal remembers. [15/06/2020 17:00:43] <rovarga> there definitely was some work being done in March-May 2019: https://git.opendaylight.org/gerrit/c/releng/builder/+/80752 [15/06/2020 17:01:50] <zxiiro> rovarga: Right, it's possible that the mininet scripts were not updated when we migrated and took longer to get done. We focused on java-builder with the initial migration. [15/06/2020 17:02:08] <zxiiro> mininet likely lagged behind with the ansible change [15/06/2020 17:03:30] <zxiiro> ok so according to your comment on INTTEST the old image was dated 20190415-091034.881 that does seem plausible. [15/06/2020 17:06:11] <rovarga> exactly [15/06/2020 17:06:26] <rovarga> so this was broken by a change which was merged afterwards [15/06/2020 17:06:28] <rovarga> *or* [15/06/2020 17:06:32] <rovarga> NOT merged [15/06/2020 17:06:49] <rovarga> there is no equivalent of https://git.opendaylight.org/gerrit/c/releng/builder/+/80752 [15/06/2020 17:07:02] <rovarga> for mininet-ovs-2.8 [15/06/2020 17:07:23] <rovarga> nite@nitebug : ~/odl/builder/packer on master $ diff -u templates/mininet-ovs-2.6.json templates/mininet-ovs-2.8.json | diffstat [15/06/2020 17:07:23] <rovarga> mininet-ovs-2.8.json | 8 ++++---- [15/06/2020 17:07:23] <rovarga> 1 file changed, 4 insertions(+), 4 deletions(-) [15/06/2020 17:07:23] <rovarga> nite@nitebug : ~/odl/builder/packer on master $ diff -u provision/mininet-ovs-2.6.yaml provision/mininet-ovs-2.8.yaml | diffstat [15/06/2020 17:07:23] <rovarga> mininet-ovs-2.8.yaml | 108 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------- [15/06/2020 17:07:23] <rovarga> 1 file changed, 81 insertions(+), 27 deletions(-) [15/06/2020 17:08:12] <rovarga> i.e. while the templates are mostly the same, the playbooks are vastly different [15/06/2020 17:18:48] <rovarga> zxiiro: and the delta of installing git-review is what broke us [15/06/2020 17:18:52] <rovarga> https://packages.ubuntu.com/xenial/git-review [15/06/2020 17:19:05] <rovarga> i.e. git-review pulls in python-requests (for python2) [15/06/2020 17:19:19] <zxiiro> hmm interesting. [15/06/2020 17:19:21] <rovarga> LuisGomez: ^^^^ [15/06/2020 17:19:46] <rovarga> I suspect it does not matter for other mininets, as we are not using them in this capacity [15/06/2020 17:19:57] <zxiiro> I don't think we should install git-review on mininet as it doesn't need it and shouldn't be doing anything with Gerrit so we should just pull in requests directly. [15/06/2020 17:20:07] <rovarga> yup [15/06/2020 17:20:21] <rovarga> we do not have the smoking gun of a commit [15/06/2020 17:20:41] <rovarga> but I think we have enough circumstantial evidence to explain what happened [15/06/2020 17:20:44] <rovarga> and how to fix it [15/06/2020 17:20:53] <zxiiro> agreed. |
| Comment by Venkatrangan Govindarajan [ 27/Aug/20 ] |
|
The issue does not affect the Aluminium release. |
| Comment by Jamo Luhrsen [ 04/Sep/20 ] |
|
this is no longer happening in Mg or Al. I'm guessing it was fixed unknowingly. closing this ticket as unreproducible |