Uploaded image for project: 'netvirt'
  1. netvirt
  2. NETVIRT-1249

Thread terminated due to an uncaught exception

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Medium Medium
    • None
    • Fluorine
    • General
    • None
    • Normal

      When running downstream scale/perf testing we see that some VMs remain unpingable even after 300 seconds and we see a lot of the exceptions in karaf logs when running following tests and ping fails

      Create a network
      Create a sbunet
      Create a router
      Attach router to subnet and oublic network
      Boot VM with floating IP
      Ping VM

      Exceptions
      --------------

      2018-04-13T22:57:05,411 | ERROR | org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.napt.switches.RouterToNaptSwitch_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | AsyncDataTreeChangeListenerBase  | 263 - org.opendaylight.genius.mdsalutil-api - 0.4.0.redhat-5 | Thread terminated due to uncaught exception: org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.napt.switches.RouterToNaptSwitch_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0
      java.lang.IllegalArgumentException: Invalid range: -1, expected: [[0..18446744073709551615]].
              at org.opendaylight.yang.gen.v1.urn.opendaylight.model.match.types.rev131026.match.TunnelBuilder.checkTunnelIdRange(TunnelBuilder.java:76) 
              at org.opendaylight.yang.gen.v1.urn.opendaylight.model.match.types.rev131026.match.TunnelBuilder.setTunnelId(TunnelBuilder.java:81) 
              at org.opendaylight.genius.mdsalutil.matches.MatchTunnelId.populateBuilder(MatchTunnelId.java:40) 
              at org.opendaylight.genius.mdsalutil.matches.MatchTunnelId.populateBuilder(MatchTunnelId.java:18) 
              at org.opendaylight.genius.mdsalutil.matches.MatchInfoHelper.createInnerMatchBuilder(MatchInfoHelper.java:29) [
              at org.opendaylight.genius.mdsalutil.MDSALUtil.buildMatches(MDSALUtil.java:336) 
              at org.opendaylight.genius.mdsalutil.FlowEntity.getFlowBuilder(FlowEntity.java:87) 
              at org.opendaylight.genius.mdsalutil.internal.MDSALManager.writeFlowEntityInternal(MDSALManager.java:180) 
              at org.opendaylight.genius.mdsalutil.internal.MDSALManager.installFlowInternal(MDSALManager.java:142) 
              at org.opendaylight.genius.mdsalutil.internal.MDSALManager.installFlow(MDSALManager.java:609) 
              at Proxya2c1fd24_dd19_4d37_b669_2a61174cba99.installFlow(Unknown Source) [?:?]
              at Proxy8f8e72d8_5b27_48d7_a9e4_a78c5736e1ae.installFlow(Unknown Source) [?:?]
              at org.opendaylight.netvirt.natservice.internal.AbstractSnatService.syncFlow(AbstractSnatService.java:329)
              at org.opendaylight.netvirt.natservice.internal.ConntrackBasedSnatService.installTerminatingServiceTblEntry(ConntrackBasedSnatService.java:148) 
              at org.opendaylight.netvirt.natservice.internal.ConntrackBasedSnatService.installSnatSpecificEntriesForNaptSwitch(ConntrackBasedSnatService.java:73) 
              at org.opendaylight.netvirt.natservice.internal.AbstractSnatService.handleSnat(AbstractSnatService.java:126) 
              at org.opendaylight.netvirt.natservice.internal.FlatVlanConntrackBasedSnatService.handleSnat(FlatVlanConntrackBasedSnatService.java:49)
              at org.opendaylight.netvirt.natservice.internal.AbstractSnatService.handleSnatAllSwitch(AbstractSnatService.java:105)
              at org.opendaylight.netvirt.natservice.internal.FlatVlanConntrackBasedSnatService.handleSnatAllSwitch(FlatVlanConntrackBasedSnatService.java:39) 
              at org.opendaylight.netvirt.natservice.internal.SnatServiceManagerImpl.notify(SnatServiceManagerImpl.java:57)
              at org.opendaylight.netvirt.natservice.ha.SnatCentralizedSwitchChangeListener.add(SnatCentralizedSwitchChangeListener.java:87) 
              at org.opendaylight.netvirt.natservice.ha.SnatCentralizedSwitchChangeListener.add(SnatCentralizedSwitchChangeListener.java:32) 
              at org.opendaylight.genius.datastoreutils.AsyncDataTreeChangeListenerBase$DataTreeChangeHandler.run(AsyncDataTreeChangeListenerBase.java:167) 
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:?]
      

      Looking at rest of the logs, the core issue is getting notifications for gw mac port creation before neutron router is created.

      2018-05-03T13:22:28,967 | WARN  | org.opendaylight.yang.gen.v1.urn.opendaylight.neutron.ports.rev150712.ports.attributes.ports.Port_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | NeutronPortChangeListener        | 362 - org.opendaylight.netvirt.neutronvpn-impl - 0.6.0 | *No router found for router GW port c69fd64f-4c6a-43a6-ae21-c72f11d29a72 for router *29ec1425-72df-4b11-9327-527cf41d58a9
      
      
      2018-05-03T13:22:29,263 | INFO  | org.opendaylight.yang.gen.v1.urn.opendaylight.neutron.l3.rev150712.routers.attributes.routers.Router_AsyncClusteredDataTreeChangeListenerBase-DataTreeChangeHandler-0 | NeutronRouterChangeListener      | 356 - org.opendaylight.netvirt.ipv6service-impl - 0.6.0 | Add Router notification handler is invoked Uuid [_value=29ec1425-72df-4b11-9327-527cf41d58a9].
      
      2018-05-03T13:22:29,283 | INFO  | ForkJoinPool-2-worker-34 | VpnInstanceListener              | 370 - org.opendaylight.netvirt.vpnmanager-impl - 0.6.0| VPN-ADD: addVpnInstance: VPN Id 106159 generated for VpnInstanceName 29ec1425-72df-4b11-9327-527cf41d58a9
      

      Router router = neutronvpnUtils.getNeutronRouter(routerId);
      if (router == null) {
      LOG.warn("No router found for router GW port {} for router {}", routerGwPort.getUuid().getValue(),
      routerId.getValue());
      return;
      }
      gwMacResolver.sendArpRequestsToExtGateways(router);

      setExternalGwMac(routerGwPort, routerId);
      }

      private void setExternalGwMac(Port routerGwPort, Uuid routerId) {
      // During full-sync networking-odl syncs routers before ports. As such,
      // the MAC of the router's gw port is not available to be set when the
      // router is written. We catch that here.

      This means it is not an issue of notifications out of order, but getting gw port notification before router was created in mdsal. The most likely reason for this is n-odl sending out of order, but it shouldn't result in unhandled exception and thread termination.

            thapar Vishal Thapar
            thapar Vishal Thapar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: