Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1809

Failed to restart all blueprint containers within 5 minutes. Attempted to restart 1 [org.opendaylight.controller.netty-threadgroup-config_0.8.0.SNAPSHOT [209]] but only 0 completed restart

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Highest Highest
    • Oxygen
    • None
    • blueprint
    • None

      <rant>It's shocking me personally that 1 week after the Big Bump, after release code freeze and project's milestone reports, we are still in a complete mess and netvirt totally broken, and everyone (incl. the full TSC) just finds this "perfectly normal" and sort of "a week or two instability is a fair price to pay for supposed previous stability due to separate releasing of odlparent and yangtools". It seems to me that we are doing this all wrong. Controller continous integration accross all projects, with better tooling to gauge impacts, not separate releasing, seems a better model for ODL, to me, to discover this kind of thing faster and earlier.</rant>

      shague reports that following the big odlparent/yangtools bump, the netvirt CSIT which runs off integration/distribution/karaf with odl-infrautils-ready and odl-netvirt-openstack, "frequently" fails during boot up, before the CSIT actually "does anything" ... we suspect that something shuts things down, and indeed this log looks very suspicious, here's the most relevant/interesting lines:

      2018-01-22T11:01:25,818 | INFO  | awaitility[checkBundleDiagInfos] | SystemReadyImpl                  | 278 - org.opendaylight.infrautils.ready-impl - 1.3.0.SNAPSHOT | checkBundleDiagInfos: Elapsed time 17s, remaining time 282s, diag: Active {Installed=0, Resolved=6, Unknown=0, GracePeriod=0, Waiting=0, Starting=0, Active=461, Stopping=0, Failure=0}
      2018-01-22T11:01:25,846 | INFO  | SystemReadyService-0 | TestBundleDiag                   | 278 - org.opendaylight.infrautils.ready-impl - 1.3.0.SNAPSHOT | diag: Active {Installed=0, Resolved=6, Unknown=0, GracePeriod=0, Waiting=0, Starting=0, Active=461, Stopping=0, Failure=0}
      2018-01-22T11:01:25,847 | INFO  | SystemReadyService-0 | SystemReadyImpl                  | 278 - org.opendaylight.infrautils.ready-impl - 1.3.0.SNAPSHOT | System ready; AKA: Aye captain, all warp coils are now operating at peak efficiency! [M.]
      2018-01-22T11:01:25,860 | INFO  | SystemReadyService-0 | SystemReadyImpl                  | 278 - org.opendaylight.infrautils.ready-impl - 1.3.0.SNAPSHOT | Now notifying all its registered SystemReadyListeners...
      2018-01-22T11:01:27,018 | INFO  | SystemReadyService-0 | MBeanUtils                       | 269 - org.opendaylight.infrautils.diagstatus-api - 1.3.0.SNAPSHOT | JMX Connector Server started for url service:jmx:rmi:///jndi/rmi://127.0.0.1:6886/server
      2018-01-22T11:01:27,572 | INFO  | paxweb-extender-1-thread-2 | ContextHandler                   | 163 - org.eclipse.jetty.util - 9.3.14.v20161028 | Started HttpServiceContext{httpContext=WebAppHttpContext{org.opendaylight.aaa.shiro - 188}}
      2018-01-22T11:01:27,575 | INFO  | paxweb-extender-1-thread-3 | ContextHandler                   | 163 - org.eclipse.jetty.util - 9.3.14.v20161028 | Started HttpServiceContext{httpContext=WebAppHttpContext{org.opendaylight.netconf.restconf-nb-bierman02 - 328}}
      2018-01-22T11:01:27,723 | INFO  | paxweb-extender-1-thread-1 | ContextHandler                   | 163 - org.eclipse.jetty.util - 9.3.14.v20161028 | Started HttpServiceContext{httpContext=WebAppHttpContext{org.opendaylight.neutron.northbound-api - 365}}
      2018-01-22T11:01:27,724 | INFO  | Thread-94        | TcpHandler                       | 388 - org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl - 0.6.0.SNAPSHOT | Switch listener started and ready to accept incoming tcp/tls connections on port: 6653
      2018-01-22T11:01:27,735 | INFO  | Thread-95        | OpenFlowPluginProviderImpl       | 380 - org.opendaylight.openflowplugin.impl - 0.6.0.SNAPSHOT | All switchConnectionProviders are up and running (2).
      2018-01-22T11:01:27,736 | INFO  | Thread-95        | TcpHandler                       | 388 - org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl - 0.6.0.SNAPSHOT | Switch listener started and ready to accept incoming tcp/tls connections on port: 6633
      2018-01-22T11:01:30,091 | INFO  | Thread-91        | StateManager                     | 358 - org.opendaylight.netvirt.statemanager-impl - 0.6.0.SNAPSHOT | StateManager all is ready
      2018-01-22T11:03:08,300 | INFO  | metrics-file-reporter-1-thread-1 | MetricsFileReporter              | 276 - org.opendaylight.infrautils.metrics-impl - 1.3.0.SNAPSHOT | Directory does not exist, creating it: metrics
      2018-01-22T11:03:08,303 | INFO  | metrics-file-reporter-1-thread-1 | MetricsFileReporter              | 276 - org.opendaylight.infrautils.metrics-impl - 1.3.0.SNAPSHOT | File does not exist, creating it: metrics.2.11
      2018-01-22T11:05:55,249 | WARN  | BlueprintContainerRestartService | BlueprintContainerRestartServiceImpl | 192 - org.opendaylight.controller.blueprint - 0.8.0.SNAPSHOT | Failed to restart all blueprint containers within 5 minutes. Attempted to restart 1 [org.opendaylight.controller.netty-threadgroup-config_0.8.0.SNAPSHOT [209]] but only 0 completed restart
      2018-01-22T11:05:55,251 | INFO  | BlueprintContainerRestartService | BlueprintContainerRestartServiceImpl | 192 - org.opendaylight.controller.blueprint - 0.8.0.SNAPSHOT | Restarting blueprint containers for bundle org.opendaylight.controller.netty-threadgroup-config_0.8.0.SNAPSHOT [209] and its dependent bundles []
      2018-01-22T11:05:55,259 | INFO  | Blueprint-Container-ThreadPool | BlueprintExtender                | 79 - org.apache.aries.blueprint.core - 1.8.3 | Destroying BlueprintContainer for bundle org.opendaylight.controller.netty-threadgroup-config/0.8.0.SNAPSHOT
      2018-01-22T11:05:55,265 | INFO  | Blueprint-Container-ThreadPool | BlueprintContainerRestartServiceImpl | 192 - org.opendaylight.controller.blueprint - 0.8.0.SNAPSHOT | Restarting blueprint container for bundle org.opendaylight.controller.netty-threadgroup-config_0.8.0.SNAPSHOT [209] with paths [bundleentry://209.fwk576936864/org/opendaylight/blueprint/netty-threadgroup.xml]
      2018-01-22T11:05:55,278 | ERROR | CM Configuration Updater (ManagedService Update: pid=[org.opendaylight.netty.threadgroup]) | configadmin                      | 5 - org.apache.felix.configadmin - 1.8.16 | [org.osgi.service.cm.ManagedService, id=576, bundle=209/mvn:org.opendaylight.controller/netty-threadgroup-config/0.8.0-SNAPSHOT]: Unexpected problem updating configuration org.opendaylight.netty.threadgroup
      org.osgi.service.blueprint.container.ServiceUnavailableException: The Blueprint container is being or has been destroyed: (objectClass=org.opendaylight.controller.blueprint.BlueprintContainerRestartService

      so if I read above correctly, we have infrautils.ready confirming that we have 461 active bundles, after (only.. not that bad) Elapsed time 17s, so all good. Then some background thread starts 3x Jetty Web contexts, openflowplugin is happy, netvirt.statemanager-impl cheers and then... controller.blueprint decides to restart controller.netty-threadgroup-config ?? And that probably is the root cause for all the havoc we see after.

        1. karaf_with_c67429.log
          896 kB
          Michael Vorburger
        2. karaf.log
          961 kB
          Michael Vorburger

            vorburger Michael Vorburger
            vorburger Michael Vorburger
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: