Uploaded image for project: 'netconf'
  1. netconf
  2. NETCONF-655

Unable to mount devices, OOM crash

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • High
    • Resolution: Won't Do
    • Sodium SR1, Sodium SR2
    • None
    • None
    • None

    Description

      Using a device with a large yang schema and the netconf testtool simulator, mounting
      devices is failing. I've observed OOM crashes as well. This is not happening in the original/first
      Sodium release but is happening in both SR1 and most recent sodium build which is possibly
      the SR2 release.

      The basic test is to create 50 devices of the same exact type and then poll on the
      operational topology (every 2s) until all are marked as "connected". The device being simulated
      takes up aprox 375MB of heap after yang compilation and the 50 devices are all compiled and
      mounted in aprox 3m with the original Sodium release. (NOTE: I am not sure I can pubilcally share
      the yang models for this device at this time) With SR1 and ~SR2, none of the devices are
      mounted.

      The karaf log shows a lot of messages like:

      2020-02-05T19:02:14,675 | WARN  | opendaylight-cluster-data-akka.actor.default-dispatcher-185 | Cluster(akka://opendaylight-cluster-data) | 39 - com.typesafe.akka.slf4j - 2.5.26 | Cluster Node [akka.tcp://opendaylight-cluster-data@127.0.0.1:2550] - Scheduled sending of heartbeat was delayed. Previous heartbeat was sent [18924] ms ago, expected interval is [1000] ms. This may cause failure detection to mark members as unreachable. The reason can be thread starvation, e.g. by running blocking tasks on the default dispatcher, CPU overload, or GC.
      2020-02-05T19:02:51,833 | WARN  | opendaylight-cluster-data-akka.actor.default-dispatcher-185 | Cluster(akka://opendaylight-cluster-data) | 39 - com.typesafe.akka.slf4j - 2.5.26 | Cluster Node [akka.tcp://opendaylight-cluster-data@127.0.0.1:2550] - Scheduled sending of heartbeat was delayed. Previous heartbeat was sent [17136] ms ago, expected interval is [1000] ms. This may cause failure detection to mark members as unreachable. The reason can be thread starvation, e.g. by running blocking tasks on the default dispatcher, CPU overload, or GC.
      

      and eventually some OOM crash messages like this:

      2020-02-05T19:03:58,779 | ERROR | Framework Event Dispatcher: org.eclipse.osgi.internal.framework.EquinoxEventPublisher@4450d156 | actor                            | 33 - com.typesafe.akka.actor - 2.5.26 | FrameworkEvent ERROR - com.typesafe.akka.actor
      java.lang.OutOfMemoryError: GC overhead limit exceeded
      2020-02-05T19:03:58,848 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-3 | ActorSystemImpl                  | 39 - com.typesafe.akka.slf4j - 2.5.26 | Uncaught error from thread [opendaylight-cluster-data-shard-dispatcher-168]: GC overhead limit exceeded, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[opendaylight-cluster-data]
      java.lang.OutOfMemoryError: GC overhead limit exceeded
      2020-02-05T19:03:58,859 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-3 | ActorSystemImpl                  | 39 - com.typesafe.akka.slf4j - 2.5.26 | Uncaught error from thread [opendaylight-cluster-data-scheduler-1]: Unexpected error from weaving hook., shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[opendaylight-cluster-data]
      java.lang.ClassFormatError: Unexpected error from weaving hook.
              at org.eclipse.osgi.internal.weaving.WeavingHookConfigurator.processClass(WeavingHookConfigurator.java:73) ~[?:?]
              at org.eclipse.osgi.internal.loader.classpath.ClasspathManager.processClass(ClasspathManager.java:674) ~[?:?]
              at org.eclipse.osgi.internal.loader.classpath.ClasspathManager.defineClass(ClasspathManager.java:645) ~[?:?]
              at org.eclipse.osgi.internal.loader.classpath.ClasspathManager.findClassImpl(ClasspathManager.java:578) ~[?:?]
              at org.eclipse.osgi.internal.loader.classpath.ClasspathManager.findLocalClassImpl(ClasspathManager.java:538) ~[?:?]
              at org.eclipse.osgi.internal.loader.classpath.ClasspathManager.findLocalClass(ClasspathManager.java:525) ~[?:?]
              at org.eclipse.osgi.internal.loader.ModuleClassLoader.findLocalClass(ModuleClassLoader.java:328) ~[?:?]
              at org.eclipse.osgi.internal.loader.BundleLoader.findLocalClass(BundleLoader.java:368) ~[?:?]
              at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:446) ~[?:?]
              at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:395) ~[?:?]
              at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:387) ~[?:?]
              at org.eclipse.osgi.internal.loader.ModuleClassLoader.loadClass(ModuleClassLoader.java:150) ~[?:?]
              at java.lang.ClassLoader.loadClass(ClassLoader.java:352) ~[?:?]
              at akka.event.BusLogging.notifyError(Logging.scala:1932) ~[33:com.typesafe.akka.actor:2.5.26]
              at akka.event.LoggingAdapter.error(Logging.scala:1207) ~[33:com.typesafe.akka.actor:2.5.26]
              at akka.event.LoggingAdapter.error$(Logging.scala:1207) ~[33:com.typesafe.akka.actor:2.5.26]
              at akka.event.BusLogging.error(Logging.scala:1914) ~[33:com.typesafe.akka.actor:2.5.26]
              at akka.actor.LightArrayRevolverScheduler$$anon$3.run(LightArrayRevolverScheduler.scala:244) ~[33:com.typesafe.akka.actor:2.5.26]
              at java.lang.Thread.run(Thread.java:748) ~[?:?]
      Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            jluhrsen Jamo Luhrsen
            jluhrsen Jamo Luhrsen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: