-
Bug
-
Resolution: Done
-
Highest
-
Magnesium SR2, Sodium SR4
-
None
-
None
This job appears to have a regression between SR1 and SR2. The job will pass from
time to time, but for the most part it will fail in the Getmulti and Getsingle suites which
mount 500 instances of the netconf-testtool and then attempt to "issue requests" on
each device in order (essentially a GET to yang-ext:mount for each device). When the
python tool doing those requests has trouble the test case fails.
In the sandbox, I ran a job with the Sr1 release which saw two failures in 35 iterations,
whereas using a distro built recently (so Sr2 bits) the results were 8 passes in 36 iterations.
Jenkins job history in the web ui only goes back 30 builds or so, but I pulled all the
console logs since job 225 (May 1st – just before SR1 was released) until the most
recent build #327. Some of those jobs were infra aborts, but it's aprox 90 data points.
here is a hacky spreadsheet to illustrate results.
There seems to be two points of interest. The job was mostly passing for quite some
time, then had 10 straight failures before becoming stable again. That happened at
job #265 (June 6th). Finally job #301 failed again (July 11th) and it seems to be unstable
since that point on.
Nothing in the netconf project stands out to me in the June 6th timeframe, but there
was an MRI bump on July 10th
Here is a karaf.log from the most recent failed job. There is some ugliness that
may be a place to start looking. Search for "Getmulti.Issue_Requests_On_Devices" and
scroll down. Seems that one mount session went down, then a lot of NETCONF
operation failed Exceptions.
- split to
-
INTTEST-123 netconf-csit-1node-scale-only-magnesium is failing
- Resolved