[GENIUS-114] Genius karaf shutdock deadlocks Created: 19/Feb/18  Updated: 19/Feb/18  Resolved: 19/Feb/18

Status: Resolved
Project: genius
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Highest
Reporter: Michael Vorburger Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File GENIUS-114-jstack.txt    
Issue Links:
Blocks
blocks ODLPARENT-140 SFT (Pax Exam, really) sometimes gets... Open
is blocked by OVSDB-452 TransactionInvokerImpl should be daemon Resolved
is blocked by OVSDB-454 OvsdbConnectionService background thr... Resolved

 Description   

We currently cannot cleanly shutdown Karaf with genius install anymore, see:

cd genius/karaf
git checkout master
git pulll
mvn clean package
target/assembly/bin/karaf
feature:install odl-genius
shutdown

This "hangs" - until you Ctrl-C it.

Attached is the jstack from this, showing a few non-daemon threads I suspect are responsible for this.

I'll open separate new linked issues for problems in projects outside of genius likely causing this.



 Comments   
Comment by Michael Vorburger [ 19/Feb/18 ]
$ cat ~/GENIUS-114-jstack.txt | grep os_prio= | grep -v daemon 

"Framework stop" #329 prio=5 os_prio=0 tid=0x00007fbd640073a0 nid=0x3e6e waiting for monitor entry [0x00007fbd47c9b000]
"nioEventLoopGroup-4-1" #291 prio=10 os_prio=0 tid=0x00007fbdbc146f60 nid=0x3d81 runnable [0x00007fbd45080000]
"Thread-39" #283 prio=5 os_prio=0 tid=0x00007fbd6c028780 nid=0x3d79 in Object.wait() [0x00007fbd45884000]
"transaction-invoker-impl-0" #280 prio=5 os_prio=0 tid=0x00007fbd843ceb00 nid=0x3d76 waiting on condition [0x00007fbd45d87000]
"pool-35-thread-2" #246 prio=5 os_prio=0 tid=0x00007fbd80081940 nid=0x3d53 waiting on condition [0x00007fbd47e9e000]
"pool-35-thread-1" #245 prio=5 os_prio=0 tid=0x00007fbd80080710 nid=0x3d52 waiting on condition [0x00007fbd47d9d000]
"config-blank-txn-0" #162 prio=5 os_prio=0 tid=0x00007fbdd0063770 nid=0x3d01 waiting on condition [0x00007fbd4ce48000]
"config-bundle-tracker-0" #157 prio=5 os_prio=0 tid=0x00007fbd54b5a350 nid=0x3cfc waiting on condition [0x00007fbd4e170000]
"RMI Reaper" #77 prio=5 os_prio=0 tid=0x00007fbd781be090 nid=0x3c90 in Object.wait() [0x00007fbd4ebec000]
"features-1-thread-1" #28 prio=5 os_prio=0 tid=0x00007fbd5813b4f0 nid=0x3c61 waiting on condition [0x00007fbd8a50b000]
"Karaf Lock Monitor Thread" #17 prio=5 os_prio=0 tid=0x00007fbdec5d4140 nid=0x3c55 waiting on condition [0x00007fbd8bb43000]
"Active Thread: Equinox Container: 5e136c7a-ea62-4200-96f7-f3a0acddaff9" #13 prio=5 os_prio=0 tid=0x00007fbdec48a880 nid=0x3c52 waiting on condition [0x00007fbdb010d000]
"main" #1 prio=5 os_prio=0 tid=0x00007fbdec00a8d0 nid=0x3c40 in Object.wait() [0x00007fbdf343e000]
"VM Thread" os_prio=0 tid=0x00007fbdec084e00 nid=0x3c45 runnable 
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007fbdec01fe80 nid=0x3c41 runnable 
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007fbdec0212f0 nid=0x3c42 runnable 
"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007fbdec022760 nid=0x3c43 runnable 
"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007fbdec023bd0 nid=0x3c44 runnable 
"VM Periodic Task Thread" os_prio=0 tid=0x00007fbdec2b0a40 nid=0x3c4e waiting on condition

I've started creating issues OPNFLWPLUG-979 and OVSDB-452 (more needed), but...

... actually I'm not so sure that these non-daemon threads are what's causing this - can others chime in here?

The daemon would block shutdown AFTER the main thread returns, right? That is stuck however:

"main" #1 prio=5 os_prio=0 tid=0x00007fbdec00a8d0 nid=0x3c40 in Object.wait() [0x00007fbdf343e000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000000807aadf0> (a java.util.concurrent.atomic.AtomicReference)
	at org.eclipse.osgi.container.SystemModule.waitForStop(SystemModule.java:168)
	- locked <0x00000000807aadf0> (a java.util.concurrent.atomic.AtomicReference)
	at org.eclipse.osgi.internal.framework.EquinoxBundle$SystemBundle.waitForStop(EquinoxBundle.java:250)
	at org.eclipse.osgi.launch.Equinox.waitForStop(Equinox.java:181)
	at org.apache.karaf.main.Main.awaitShutdown(Main.java:631)
	at org.apache.karaf.main.Main.main(Main.java:189)

I do not understand what this is waiting for without digging much deeper - which I don't currently have time for.

Comment by Tom Pantelis [ 19/Feb/18 ]

The stuck thread is in an Object.wait in the waitForStop. I see no other threads that are really doing anything so I suspect whatever code was supposed to notify the Object didn't for some reason, possibly due to an exception/error path that elided the notify. I think we'd need to see the karaf log that hopefully shows a smoking gun exception.

Comment by Michael Vorburger [ 19/Feb/18 ]

> I think we'd need to see the karaf log that hopefully shows a smoking gun exception.

Unforatuntely I wiped it, but thought it was easy to reproduce.. and now cannot anymore - it works?!

May be a timing issue... I do hit (other.. weird) issues if I do shutdown too quickly, before diag return empty. 

Closing this as CANNOT REPRO - if others hit this again, please open with new details.

Generated at Wed Feb 07 19:59:56 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.