[BGPCEP-766] Deadlocked PCEP Created: 16/Mar/18 Updated: 18/Apr/18 Resolved: 21/Mar/18 |
|
| Status: | Verified |
| Project: | bgpcep |
| Component/s: | PCEP |
| Affects Version/s: | None |
| Fix Version/s: | Oxygen |
| Type: | Bug | Priority: | Medium |
| Reporter: | Claudio David Gasparini | Assignee: | Claudio David Gasparini |
| Resolution: | Done | Votes: | 0 |
| Labels: | csit:exception | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
2018-03-16T00:47:59,642 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 355 - org.opendaylight.infrautils.metrics-impl - 1.3.0.SNAPSHOT | Oh nose - there are 2 deadlocked threads!! |
| Comments |
| Comment by Kit Lou [ 20/Mar/18 ] |
|
cdgasparini - Is this issue a regression (i.e. used to work in prior releases)? What is the customer impact? Is there a workaround that can be documented? Can we ship with it, i.e. fix the problem in Oxygen-SR1? |
| Comment by Claudio David Gasparini [ 20/Mar/18 ] |
|
Hi klou, this is a regression, issue is a deadlock which will make the feature unusable and affecting other depending features as well, since everything will freeze. There is no workaround, and I marked as blocker because I consider it a mandatory fix. If the issue was other than a deadlock, I wouldn't have issue to move this to SR1, but this is a must IMHO.
Regards, |
| Comment by Daniel Farrell [ 20/Mar/18 ] |
|
cdgasparini - It looks like this same test was failing and never signed-off for Oxygen RC1 and RC2. We should have raised this as a blocker then and had it fixed well before now. Since our drop-dead release date is tomorrow, and there isn't a patch ready-to-merge immediately, I strongly recommend we do not block the release for this. If you disagree you can take it to the TSC. |
| Comment by Claudio David Gasparini [ 20/Mar/18 ] |
|
Hi dfarrell07 test can be failing, but not with this issue, that is why I raised the blocker, and the fix is ready to be merged .So I dont see the issue on take the fix. On the opposite side BGP and PCEP features will be locked and not usable if hitting this bug. So I stand with this bug as a blocker.
Regards, |
| Comment by Daniel Farrell [ 20/Mar/18 ] |
|
> So I dont see the issue on take the fix. We would have to respin autorelease, hope we get lucky again avoid all sporadic failures (unlikely) and then ask all projects to start over with reviewing the RC, submitting their Final Milestones and signing off on test failures. > fix is ready to be merged Please link the fix to the Jira. > but not with this issue So these are the new failures: It does look like they have been failing quite a bit recently. I also see the same jobs failing in RC1. |
| Comment by Daniel Farrell [ 20/Mar/18 ] |
|
I found the change: https://git.opendaylight.org/gerrit/#/c/69659/ |
| Comment by Kit Lou [ 20/Mar/18 ] |
|
cdgasparini - Please provide the gerrit patch info. We should have it merged and re-spin as soon as possible. TSC can decide whether to block the existing RC3 for it. |
| Comment by Claudio David Gasparini [ 20/Mar/18 ] |
|
Yes, I know the pain of the respin. I don't argue that. The issue resides on the logs, if we do visual check on failing test it will look the same, but checking the karaf logs you will see -> org.opendaylight.infrautils.metrics-impl - 1.3.0 | Deadlocked vs RC1 |
| Comment by Daniel Farrell [ 20/Mar/18 ] |
|
As discussed on IRC and mailing list, we're going to go ahead and merge 69659, kick a new autorelease build (hoping we get lucky avoiding sporadic failures) and let the TSC vote if they want to re-do RC3 in the meantime. |
| Comment by Daniel Farrell [ 20/Mar/18 ] |
|
Discussion ongoing here: https://lists.opendaylight.org/pipermail/tsc/2018-March/009160.html |
| Comment by Claudio David Gasparini [ 20/Mar/18 ] |
|
Thanks for your understanding, I hope we don't hit any failure. Regards, |