<rant>It's shocking me personally that 1 week after the Big Bump, after release code freeze and project's milestone reports, we are still in a complete mess and netvirt totally broken, and everyone (incl. the full TSC) just finds this "perfectly normal" and sort of "a week or two instability is a fair price to pay for supposed previous stability due to separate releasing of odlparent and yangtools". It seems to me that we are doing this all wrong. Controller continous integration accross all projects, with better tooling to gauge impacts, not separate releasing, seems a better model for ODL, to me, to discover this kind of thing faster and earlier.</rant>
shague reports that following the big odlparent/yangtools bump, the netvirt CSIT which runs off integration/distribution/karaf with odl-infrautils-ready and odl-netvirt-openstack, "frequently" fails during boot up, before the CSIT actually "does anything" ... we suspect that something shuts things down, and indeed this log looks very suspicious, here's the most relevant/interesting lines:
2018-01-22T11:01:25,818 | INFO | awaitility[checkBundleDiagInfos] | SystemReadyImpl | 278 - org.opendaylight.infrautils.ready-impl - 1.3.0.SNAPSHOT | checkBundleDiagInfos: Elapsed time 17s, remaining time 282s, diag: Active {Installed=0, Resolved=6, Unknown=0, GracePeriod=0, Waiting=0, Starting=0, Active=461, Stopping=0, Failure=0} 2018-01-22T11:01:25,846 | INFO | SystemReadyService-0 | TestBundleDiag | 278 - org.opendaylight.infrautils.ready-impl - 1.3.0.SNAPSHOT | diag: Active {Installed=0, Resolved=6, Unknown=0, GracePeriod=0, Waiting=0, Starting=0, Active=461, Stopping=0, Failure=0} 2018-01-22T11:01:25,847 | INFO | SystemReadyService-0 | SystemReadyImpl | 278 - org.opendaylight.infrautils.ready-impl - 1.3.0.SNAPSHOT | System ready; AKA: Aye captain, all warp coils are now operating at peak efficiency! [M.] 2018-01-22T11:01:25,860 | INFO | SystemReadyService-0 | SystemReadyImpl | 278 - org.opendaylight.infrautils.ready-impl - 1.3.0.SNAPSHOT | Now notifying all its registered SystemReadyListeners... 2018-01-22T11:01:27,018 | INFO | SystemReadyService-0 | MBeanUtils | 269 - org.opendaylight.infrautils.diagstatus-api - 1.3.0.SNAPSHOT | JMX Connector Server started for url service:jmx:rmi:///jndi/rmi://127.0.0.1:6886/server 2018-01-22T11:01:27,572 | INFO | paxweb-extender-1-thread-2 | ContextHandler | 163 - org.eclipse.jetty.util - 9.3.14.v20161028 | Started HttpServiceContext{httpContext=WebAppHttpContext{org.opendaylight.aaa.shiro - 188}} 2018-01-22T11:01:27,575 | INFO | paxweb-extender-1-thread-3 | ContextHandler | 163 - org.eclipse.jetty.util - 9.3.14.v20161028 | Started HttpServiceContext{httpContext=WebAppHttpContext{org.opendaylight.netconf.restconf-nb-bierman02 - 328}} 2018-01-22T11:01:27,723 | INFO | paxweb-extender-1-thread-1 | ContextHandler | 163 - org.eclipse.jetty.util - 9.3.14.v20161028 | Started HttpServiceContext{httpContext=WebAppHttpContext{org.opendaylight.neutron.northbound-api - 365}} 2018-01-22T11:01:27,724 | INFO | Thread-94 | TcpHandler | 388 - org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl - 0.6.0.SNAPSHOT | Switch listener started and ready to accept incoming tcp/tls connections on port: 6653 2018-01-22T11:01:27,735 | INFO | Thread-95 | OpenFlowPluginProviderImpl | 380 - org.opendaylight.openflowplugin.impl - 0.6.0.SNAPSHOT | All switchConnectionProviders are up and running (2). 2018-01-22T11:01:27,736 | INFO | Thread-95 | TcpHandler | 388 - org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl - 0.6.0.SNAPSHOT | Switch listener started and ready to accept incoming tcp/tls connections on port: 6633 2018-01-22T11:01:30,091 | INFO | Thread-91 | StateManager | 358 - org.opendaylight.netvirt.statemanager-impl - 0.6.0.SNAPSHOT | StateManager all is ready 2018-01-22T11:03:08,300 | INFO | metrics-file-reporter-1-thread-1 | MetricsFileReporter | 276 - org.opendaylight.infrautils.metrics-impl - 1.3.0.SNAPSHOT | Directory does not exist, creating it: metrics 2018-01-22T11:03:08,303 | INFO | metrics-file-reporter-1-thread-1 | MetricsFileReporter | 276 - org.opendaylight.infrautils.metrics-impl - 1.3.0.SNAPSHOT | File does not exist, creating it: metrics.2.11 2018-01-22T11:05:55,249 | WARN | BlueprintContainerRestartService | BlueprintContainerRestartServiceImpl | 192 - org.opendaylight.controller.blueprint - 0.8.0.SNAPSHOT | Failed to restart all blueprint containers within 5 minutes. Attempted to restart 1 [org.opendaylight.controller.netty-threadgroup-config_0.8.0.SNAPSHOT [209]] but only 0 completed restart 2018-01-22T11:05:55,251 | INFO | BlueprintContainerRestartService | BlueprintContainerRestartServiceImpl | 192 - org.opendaylight.controller.blueprint - 0.8.0.SNAPSHOT | Restarting blueprint containers for bundle org.opendaylight.controller.netty-threadgroup-config_0.8.0.SNAPSHOT [209] and its dependent bundles [] 2018-01-22T11:05:55,259 | INFO | Blueprint-Container-ThreadPool | BlueprintExtender | 79 - org.apache.aries.blueprint.core - 1.8.3 | Destroying BlueprintContainer for bundle org.opendaylight.controller.netty-threadgroup-config/0.8.0.SNAPSHOT 2018-01-22T11:05:55,265 | INFO | Blueprint-Container-ThreadPool | BlueprintContainerRestartServiceImpl | 192 - org.opendaylight.controller.blueprint - 0.8.0.SNAPSHOT | Restarting blueprint container for bundle org.opendaylight.controller.netty-threadgroup-config_0.8.0.SNAPSHOT [209] with paths [bundleentry://209.fwk576936864/org/opendaylight/blueprint/netty-threadgroup.xml] 2018-01-22T11:05:55,278 | ERROR | CM Configuration Updater (ManagedService Update: pid=[org.opendaylight.netty.threadgroup]) | configadmin | 5 - org.apache.felix.configadmin - 1.8.16 | [org.osgi.service.cm.ManagedService, id=576, bundle=209/mvn:org.opendaylight.controller/netty-threadgroup-config/0.8.0-SNAPSHOT]: Unexpected problem updating configuration org.opendaylight.netty.threadgroup org.osgi.service.blueprint.container.ServiceUnavailableException: The Blueprint container is being or has been destroyed: (objectClass=org.opendaylight.controller.blueprint.BlueprintContainerRestartService
so if I read above correctly, we have infrautils.ready confirming that we have 461 active bundles, after (only.. not that bad) Elapsed time 17s, so all good. Then some background thread starts 3x Jetty Web contexts, openflowplugin is happy, netvirt.statemanager-impl cheers and then... controller.blueprint decides to restart controller.netty-threadgroup-config ?? And that probably is the root cause for all the havoc we see after.