[NETCONF-655] Unable to mount devices, OOM crash Created: 05/Feb/20 Updated: 07/Feb/20 Resolved: 07/Feb/20 |
|
| Status: | Resolved |
| Project: | netconf |
| Component/s: | None |
| Affects Version/s: | Sodium SR1, Sodium SR2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | High |
| Reporter: | Jamo Luhrsen | Assignee: | Jamo Luhrsen |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
Using a device with a large yang schema and the netconf testtool simulator, mounting The basic test is to create 50 devices of the same exact type and then poll on the The karaf log shows a lot of messages like: 2020-02-05T19:02:14,675 | WARN | opendaylight-cluster-data-akka.actor.default-dispatcher-185 | Cluster(akka://opendaylight-cluster-data) | 39 - com.typesafe.akka.slf4j - 2.5.26 | Cluster Node [akka.tcp://opendaylight-cluster-data@127.0.0.1:2550] - Scheduled sending of heartbeat was delayed. Previous heartbeat was sent [18924] ms ago, expected interval is [1000] ms. This may cause failure detection to mark members as unreachable. The reason can be thread starvation, e.g. by running blocking tasks on the default dispatcher, CPU overload, or GC. 2020-02-05T19:02:51,833 | WARN | opendaylight-cluster-data-akka.actor.default-dispatcher-185 | Cluster(akka://opendaylight-cluster-data) | 39 - com.typesafe.akka.slf4j - 2.5.26 | Cluster Node [akka.tcp://opendaylight-cluster-data@127.0.0.1:2550] - Scheduled sending of heartbeat was delayed. Previous heartbeat was sent [17136] ms ago, expected interval is [1000] ms. This may cause failure detection to mark members as unreachable. The reason can be thread starvation, e.g. by running blocking tasks on the default dispatcher, CPU overload, or GC. and eventually some OOM crash messages like this: 2020-02-05T19:03:58,779 | ERROR | Framework Event Dispatcher: org.eclipse.osgi.internal.framework.EquinoxEventPublisher@4450d156 | actor | 33 - com.typesafe.akka.actor - 2.5.26 | FrameworkEvent ERROR - com.typesafe.akka.actor java.lang.OutOfMemoryError: GC overhead limit exceeded 2020-02-05T19:03:58,848 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-3 | ActorSystemImpl | 39 - com.typesafe.akka.slf4j - 2.5.26 | Uncaught error from thread [opendaylight-cluster-data-shard-dispatcher-168]: GC overhead limit exceeded, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[opendaylight-cluster-data] java.lang.OutOfMemoryError: GC overhead limit exceeded 2020-02-05T19:03:58,859 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-3 | ActorSystemImpl | 39 - com.typesafe.akka.slf4j - 2.5.26 | Uncaught error from thread [opendaylight-cluster-data-scheduler-1]: Unexpected error from weaving hook., shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[opendaylight-cluster-data] java.lang.ClassFormatError: Unexpected error from weaving hook. at org.eclipse.osgi.internal.weaving.WeavingHookConfigurator.processClass(WeavingHookConfigurator.java:73) ~[?:?] at org.eclipse.osgi.internal.loader.classpath.ClasspathManager.processClass(ClasspathManager.java:674) ~[?:?] at org.eclipse.osgi.internal.loader.classpath.ClasspathManager.defineClass(ClasspathManager.java:645) ~[?:?] at org.eclipse.osgi.internal.loader.classpath.ClasspathManager.findClassImpl(ClasspathManager.java:578) ~[?:?] at org.eclipse.osgi.internal.loader.classpath.ClasspathManager.findLocalClassImpl(ClasspathManager.java:538) ~[?:?] at org.eclipse.osgi.internal.loader.classpath.ClasspathManager.findLocalClass(ClasspathManager.java:525) ~[?:?] at org.eclipse.osgi.internal.loader.ModuleClassLoader.findLocalClass(ModuleClassLoader.java:328) ~[?:?] at org.eclipse.osgi.internal.loader.BundleLoader.findLocalClass(BundleLoader.java:368) ~[?:?] at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:446) ~[?:?] at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:395) ~[?:?] at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:387) ~[?:?] at org.eclipse.osgi.internal.loader.ModuleClassLoader.loadClass(ModuleClassLoader.java:150) ~[?:?] at java.lang.ClassLoader.loadClass(ClassLoader.java:352) ~[?:?] at akka.event.BusLogging.notifyError(Logging.scala:1932) ~[33:com.typesafe.akka.actor:2.5.26] at akka.event.LoggingAdapter.error(Logging.scala:1207) ~[33:com.typesafe.akka.actor:2.5.26] at akka.event.LoggingAdapter.error$(Logging.scala:1207) ~[33:com.typesafe.akka.actor:2.5.26] at akka.event.BusLogging.error(Logging.scala:1914) ~[33:com.typesafe.akka.actor:2.5.26] at akka.actor.LightArrayRevolverScheduler$$anon$3.run(LightArrayRevolverScheduler.scala:244) ~[33:com.typesafe.akka.actor:2.5.26] at java.lang.Thread.run(Thread.java:748) ~[?:?] Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded |
| Comments |
| Comment by Jamo Luhrsen [ 05/Feb/20 ] |
|
If I increase the max mem heap value to 7G, I can get around this issue. Note that the original sodium release default of 2G is |
| Comment by Robert Varga [ 07/Feb/20 ] |
|
Hmm, that's weird, I do not see anything in yangtools between GA and SR1 which would cause this |
| Comment by Jamo Luhrsen [ 07/Feb/20 ] |
|
Please see this 3G heap dump that was created using a recent sodium distribution (so after SR1) and trying to mount 50 devices (trying to make the crash happen sooner). This time the OOM crash took aprox 1 hour to occur. |
| Comment by Jamo Luhrsen [ 07/Feb/20 ] |
|
Sorry for the false alarm here. I was mistakenly using a modified sodium distribution when I |