[CONTROLLER-1224] An attempt to push 5k paths to ODL SR3 through BGP exposed a bug which causes the odl-restconf-noauth to hang on install approx 3% of the time Created: 24/Mar/15 Updated: 23/Jul/15 Resolved: 23/Jul/15 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | restconf |
| Affects Version/s: | Helium |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | RichardHill | Assignee: | Jozef Behran |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| External issue ID: | 2891 |
| Description |
|
Jozez Behran created a test job that repeatedly performs a small load test of Helium SR3 during attempt to push 5k paths to ODL through BGP. It exposes a bug which causes the odl-restconf-noauth to hang on install occasionally. He has captured a YourKit performance snapshot in the time when the test failed due to the hang so it is now possible to see what exactly is blocked and where. I'll add as attachment. |
| Comments |
| Comment by RichardHill [ 24/Mar/15 ] |
|
https://drive.google.com/file/d/0ByXiyf4iY7RYbXZWdlVYdHV6clE/view?usp=sharing |
| Comment by RichardHill [ 24/Mar/15 ] |
|
Needs yourkit profiler to view https://drive.google.com/file/d/0ByXiyf4iY7RYTjVQMFpNV2ExZmc/view?usp=sharing https://drive.google.com/file/d/0ByXiyf4iY7RYX2Y0c3hVdTF6NFU/view?usp=sharing |
| Comment by RichardHill [ 24/Mar/15 ] |
|
This test executes a performance test of Helium SR3 on the following scenario: Unclustered configuration |
| Comment by RichardHill [ 25/Mar/15 ] |
|
Investigation of profiling snapshot revealed the suspicious method is |
| Comment by Jozef Behran [ 25/Mar/15 ] |
|
My suspection is that odl-restconf-noauth is doing something wrong with the OSGi registry (or with the Karaf's InstallFeature functionality) and when the constellation of the bytes in the process happens to be just right, it hangs the whole thing. |
| Comment by Jozef Behran [ 25/Mar/15 ] |
|
I found the following:
com.sun.jersey.core.osgi.OsgRegistry.setOSGiServiceFinderIteratorProvider() To me this thread seems to be the one trying to install odl-restconf-noauth
com.sun.jersey.core.osgi.OsgiRegistry.getInstance() I am not sure about the purpose of this thread or why it is still hanging around here but in every failure I see it hung at exactly this one spot. I was not able to track this thread's stacktrace down in a snapshot of a build where the test passed though. |
| Comment by Vaclav Demcak [ 13/Jul/15 ] |
|
Hi all. |
| Comment by RichardHill [ 23/Jul/15 ] |
|
Hi Vaclav To clarify you want the test rerun with full logging and the log provided. |
| Comment by Jozef Behran [ 23/Jul/15 ] |
|
Testing for this bug requires running the test job for several days. Additionally, the full logging is most likely to cause this type of heisenbug to disappear. As Helium SR4 is the last release and its deadline is today, there is no time to even determine whether the bug is still there or not, let alone fixing it. |