[INFRAUTILS-66] Threadswatcher causing safepoint pauses due to calling FindDeadlocks every 500ms Created: 25/Nov/19 Updated: 29/Nov/19 Resolved: 29/Nov/19 |
|
| Status: | Resolved |
| Project: | infrautils |
| Component/s: | metrics |
| Affects Version/s: | Oxygen-SR4, Neon, Sodium, Magnesium |
| Fix Version/s: | Magnesium, Sodium SR2, Neon SR3 |
| Type: | Bug | Priority: | High |
| Reporter: | Tomas Cere | Assignee: | Robert Varga |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
We experience frequent safepoint pauses seen in this log:
// code
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
2653.486: FindDeadlocks [ 870 1 1 ] [ 1 0 2 1 0 ] 1
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
2653.986: FindDeadlocks [ 870 0 0 ] [ 0 0 0 1 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
2654.486: FindDeadlocks [ 870 0 3 ] [ 0 0 0 1 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
2654.986: FindDeadlocks [ 870 0 2 ] [ 0 0 0 1 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
2655.486: FindDeadlocks [ 871 0 1 ] [ 0 0 0 1 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
2655.986: FindDeadlocks [ 871 0 0 ] [ 0 0 0 1 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
2656.486: FindDeadlocks [ 870 0 0 ] [ 0 0 0 1 0 ] 0
Which was traced to ThreadsWatcher calling FindDeadlocks every 500ms. |
| Comments |
| Comment by Robert Varga [ 25/Nov/19 ] |
|
This results in large pauses as evidenced by: 2019-11-15T15:43:36.928+0000: 3304.685: Total time for which application threads were stopped: 9.2916061 seconds, Stopping threads took: 0.0001933 seconds 2019-11-15T15:44:47.185+0000: 3374.942: Total time for which application threads were stopped: 9.5493942 seconds, Stopping threads took: 0.0001182 seconds 2019-11-15T15:46:00.042+0000: 3447.800: Total time for which application threads were stopped: 12.4064250 seconds, Stopping threads took: 0.0001305 seconds 2019-11-15T15:47:06.771+0000: 3514.528: Total time for which application threads were stopped: 6.1348460 seconds, Stopping threads took: 0.0005121 seconds 2019-11-15T15:48:12.704+0000: 3580.462: Total time for which application threads were stopped: 5.5681979 seconds, Stopping threads took: 0.0002656 seconds 2019-11-15T15:49:20.000+0000: 3647.758: Total time for which application threads were stopped: 6.8645785 seconds, Stopping threads took: 0.0001139 seconds 2019-11-15T15:50:22.866+0000: 3710.623: Total time for which application threads were stopped: 2.6894180 seconds, Stopping threads took: 0.0002664 seconds 2019-11-15T15:51:29.921+0000: 3777.678: Total time for which application threads were stopped: 6.7841151 seconds, Stopping threads took: 0.0001327 seconds 2019-11-15T15:52:35.962+0000: 3843.719: Total time for which application threads were stopped: 5.8258694 seconds, Stopping threads took: 0.0001516 seconds 2019-11-15T15:53:54.924+0000: 3922.681: Total time for which application threads were stopped: 4.7865003 seconds, Stopping threads took: 0.0002752 seconds 2019-11-15T15:55:13.234+0000: 4000.992: Total time for which application threads were stopped: 3.0976428 seconds, Stopping threads took: 0.0002604 seconds 2019-11-15T15:57:38.405+0000: 4146.162: Total time for which application threads were stopped: 4.2689397 seconds, Stopping threads took: 0.0001862 seconds 2019-11-15T15:58:45.222+0000: 4212.980: Total time for which application threads were stopped: 6.5856216 seconds, Stopping threads took: 0.0014271 seconds i.e. every 60 seconds we force a stop-the-world for 3-12 seconds. I think this facility should be turned off by default. |
| Comment by Michael Vorburger [ 26/Nov/19 ] |
|
Fixing this sounds like a great idea! JIRA assigns this to me automatically (default assignee), but I'm guessing you'd like to fix this? Go for it! (Unassigned from myself just to clarify I'm not actually going to work on this myself - but I'm happy to review any related Gerrit.) |