[NETVIRT-1249] Thread terminated due to an uncaught exception Created: 08/May/18  Updated: 04/Jul/18  Resolved: 04/Jul/18

Status: Resolved
Project: netvirt
Component/s: General
Affects Version/s: Fluorine
Fix Version/s: None

Type: Bug Priority: Medium
Reporter: Vishal Thapar Assignee: Vishal Thapar
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

When running following test case in downstream scale/perf setup
Create a network
Create a sbunet
Create a router
Attach router to subnet and oublic network
Boot VM with floating IP
Ping VM

we see that some VMs remain unpingable even after 300 seconds and we see a lot of the following exceptions in karaf logs

2018-04-13T22:57:05,411 | ERROR | org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.napt.switches.RouterToNaptSwitch_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | AsyncDataTreeChangeListenerBase | 263 - org.opendaylight.genius.mdsalutil-api - 0.4.0 | Thread terminated due to uncaught exception: org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.napt.switches.RouterToNaptSwitch_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0
java.lang.IllegalArgumentException: Invalid range: -1, expected: [[0..18446744073709551615]].
at org.opendaylight.yang.gen.v1.urn.opendaylight.model.match.types.rev131026.match.TunnelBuilder.checkTunnelIdRange(TunnelBuilder.java:76) [391:org.opendaylight.openflowplugin.model.flow-base:0.6.0]
at org.opendaylight.yang.gen.v1.urn.opendaylight.model.match.types.rev131026.match.TunnelBuilder.setTunnelId(TunnelBuilder.java:81) [391:org.opendaylight.openflowplugin.model.flow-base:0.6.0]
at org.opendaylight.genius.mdsalutil.matches.MatchTunnelId.populateBuilder(MatchTunnelId.java:40) [263:org.opendaylight.genius.mdsalutil-api:0.4.0]
at org.opendaylight.genius.mdsalutil.matches.MatchTunnelId.populateBuilder(MatchTunnelId.java:18) [263:org.opendaylight.genius.mdsalutil-api:0.4.0]
at org.opendaylight.genius.mdsalutil.matches.MatchInfoHelper.createInnerMatchBuilder(MatchInfoHelper.java:29) [263:org.opendaylight.genius.mdsalutil-api:0.4.0]
at org.opendaylight.genius.mdsalutil.MDSALUtil.buildMatches(MDSALUtil.java:336) [263:org.opendaylight.genius.mdsalutil-api:0.4.0]
at org.opendaylight.genius.mdsalutil.FlowEntity.getFlowBuilder(FlowEntity.java:87) [263:org.opendaylight.genius.mdsalutil-api:0.4.0]
at org.opendaylight.genius.mdsalutil.internal.MDSALManager.writeFlowEntityInternal(MDSALManager.java:180) [264:org.opendaylight.genius.mdsalutil-impl:0.4.0]
at org.opendaylight.genius.mdsalutil.internal.MDSALManager.installFlowInternal(MDSALManager.java:142) [264:org.opendaylight.genius.mdsalutil-impl:0.4.0]
at org.opendaylight.genius.mdsalutil.internal.MDSALManager.installFlow(MDSALManager.java:609) [264:org.opendaylight.genius.mdsalutil-impl:0.4.0]
at Proxya2c1fd24_dd19_4d37_b669_2a61174cba99.installFlow(Unknown Source) [?:?]
at Proxy8f8e72d8_5b27_48d7_a9e4_a78c5736e1ae.installFlow(Unknown Source) [?:?]
at org.opendaylight.netvirt.natservice.internal.AbstractSnatService.syncFlow(AbstractSnatService.java:329) [360:org.opendaylight.netvirt.natservice-impl:0.6.0]
at org.opendaylight.netvirt.natservice.internal.ConntrackBasedSnatService.installTerminatingServiceTblEntry(ConntrackBasedSnatService.java:148) [360:org.opendaylight.netvirt.natservice-impl:0.6.0]
at org.opendaylight.netvirt.natservice.internal.ConntrackBasedSnatService.installSnatSpecificEntriesForNaptSwitch(ConntrackBasedSnatService.java:73) [360:org.opendaylight.netvirt.natservice-impl:0.6.0]
at org.opendaylight.netvirt.natservice.internal.AbstractSnatService.handleSnat(AbstractSnatService.java:126) [360:org.opendaylight.netvirt.natservice-impl:0.6.0]
at org.opendaylight.netvirt.natservice.internal.FlatVlanConntrackBasedSnatService.handleSnat(FlatVlanConntrackBasedSnatService.java:49) [360:org.opendaylight.netvirt.natservice-impl:0.6.0]
at org.opendaylight.netvirt.natservice.internal.AbstractSnatService.handleSnatAllSwitch(AbstractSnatService.java:105) [360:org.opendaylight.netvirt.natservice-impl:0.6.0]
at org.opendaylight.netvirt.natservice.internal.FlatVlanConntrackBasedSnatService.handleSnatAllSwitch(FlatVlanConntrackBasedSnatService.java:39) [360:org.opendaylight.netvirt.natservice-impl:0.6.0]
at org.opendaylight.netvirt.natservice.internal.SnatServiceManagerImpl.notify(SnatServiceManagerImpl.java:57) [360:org.opendaylight.netvirt.natservice-impl:0.6.0]
at org.opendaylight.netvirt.natservice.ha.SnatCentralizedSwitchChangeListener.add(SnatCentralizedSwitchChangeListener.java:87) [360:org.opendaylight.netvirt.natservice-impl:0.6.0]
at org.opendaylight.netvirt.natservice.ha.SnatCentralizedSwitchChangeListener.add(SnatCentralizedSwitchChangeListener.java:32) [360:org.opendaylight.netvirt.natservice-impl:0.6.0]
at org.opendaylight.genius.datastoreutils.AsyncDataTreeChangeListenerBase$DataTreeChangeHandler.run(AsyncDataTreeChangeListenerBase.java:167) [263:org.opendaylight.genius.mdsalutil-api:0.4.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:?]

Core issue is getting gw port notification before router was created in mdsal as confirmed by following log entry:

2018-05-03T13:22:28,967 | WARN | org.opendaylight.yang.gen.v1.urn.opendaylight.neutron.ports.rev150712.ports.attributes.ports.Port_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | NeutronPortChangeListener | 362 - org.opendaylight.netvirt.neutronvpn-impl - 0.6.0 | No router found for router GW port c69fd64f-4c6a-43a6-ae21-c72f11d29a72 for router 29ec1425-72df-4b11-9327-527cf41d58a9

{{ Router router = neutronvpnUtils.getNeutronRouter(routerId);
if (router == null) {
LOG.warn("No router found for router GW port {} for router {}", routerGwPort.getUuid().getValue(),
routerId.getValue());
return;
}
gwMacResolver.sendArpRequestsToExtGateways(router);

setExternalGwMac(routerGwPort, routerId);
}

private void setExternalGwMac(Port routerGwPort, Uuid routerId) {
// During full-sync networking-odl syncs routers before ports. As such,
// the MAC of the router's gw port is not available to be set when the
// router is written. We catch that here.}}

The most likely reason for this is n-odl sending out of order and should be addressed in n-odl/NN. But even then it shouldn't lead to unhandled exception and thread termination. Since it is a race condition, a simple retry might be enough to avoid this problem.


Priority: Normal

 Description   

When running downstream scale/perf testing we see that some VMs remain unpingable even after 300 seconds and we see a lot of the exceptions in karaf logs when running following tests and ping fails

Create a network
Create a sbunet
Create a router
Attach router to subnet and oublic network
Boot VM with floating IP
Ping VM

Exceptions
--------------

2018-04-13T22:57:05,411 | ERROR | org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.napt.switches.RouterToNaptSwitch_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | AsyncDataTreeChangeListenerBase  | 263 - org.opendaylight.genius.mdsalutil-api - 0.4.0.redhat-5 | Thread terminated due to uncaught exception: org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.napt.switches.RouterToNaptSwitch_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0
java.lang.IllegalArgumentException: Invalid range: -1, expected: [[0..18446744073709551615]].
        at org.opendaylight.yang.gen.v1.urn.opendaylight.model.match.types.rev131026.match.TunnelBuilder.checkTunnelIdRange(TunnelBuilder.java:76) 
        at org.opendaylight.yang.gen.v1.urn.opendaylight.model.match.types.rev131026.match.TunnelBuilder.setTunnelId(TunnelBuilder.java:81) 
        at org.opendaylight.genius.mdsalutil.matches.MatchTunnelId.populateBuilder(MatchTunnelId.java:40) 
        at org.opendaylight.genius.mdsalutil.matches.MatchTunnelId.populateBuilder(MatchTunnelId.java:18) 
        at org.opendaylight.genius.mdsalutil.matches.MatchInfoHelper.createInnerMatchBuilder(MatchInfoHelper.java:29) [
        at org.opendaylight.genius.mdsalutil.MDSALUtil.buildMatches(MDSALUtil.java:336) 
        at org.opendaylight.genius.mdsalutil.FlowEntity.getFlowBuilder(FlowEntity.java:87) 
        at org.opendaylight.genius.mdsalutil.internal.MDSALManager.writeFlowEntityInternal(MDSALManager.java:180) 
        at org.opendaylight.genius.mdsalutil.internal.MDSALManager.installFlowInternal(MDSALManager.java:142) 
        at org.opendaylight.genius.mdsalutil.internal.MDSALManager.installFlow(MDSALManager.java:609) 
        at Proxya2c1fd24_dd19_4d37_b669_2a61174cba99.installFlow(Unknown Source) [?:?]
        at Proxy8f8e72d8_5b27_48d7_a9e4_a78c5736e1ae.installFlow(Unknown Source) [?:?]
        at org.opendaylight.netvirt.natservice.internal.AbstractSnatService.syncFlow(AbstractSnatService.java:329)
        at org.opendaylight.netvirt.natservice.internal.ConntrackBasedSnatService.installTerminatingServiceTblEntry(ConntrackBasedSnatService.java:148) 
        at org.opendaylight.netvirt.natservice.internal.ConntrackBasedSnatService.installSnatSpecificEntriesForNaptSwitch(ConntrackBasedSnatService.java:73) 
        at org.opendaylight.netvirt.natservice.internal.AbstractSnatService.handleSnat(AbstractSnatService.java:126) 
        at org.opendaylight.netvirt.natservice.internal.FlatVlanConntrackBasedSnatService.handleSnat(FlatVlanConntrackBasedSnatService.java:49)
        at org.opendaylight.netvirt.natservice.internal.AbstractSnatService.handleSnatAllSwitch(AbstractSnatService.java:105)
        at org.opendaylight.netvirt.natservice.internal.FlatVlanConntrackBasedSnatService.handleSnatAllSwitch(FlatVlanConntrackBasedSnatService.java:39) 
        at org.opendaylight.netvirt.natservice.internal.SnatServiceManagerImpl.notify(SnatServiceManagerImpl.java:57)
        at org.opendaylight.netvirt.natservice.ha.SnatCentralizedSwitchChangeListener.add(SnatCentralizedSwitchChangeListener.java:87) 
        at org.opendaylight.netvirt.natservice.ha.SnatCentralizedSwitchChangeListener.add(SnatCentralizedSwitchChangeListener.java:32) 
        at org.opendaylight.genius.datastoreutils.AsyncDataTreeChangeListenerBase$DataTreeChangeHandler.run(AsyncDataTreeChangeListenerBase.java:167) 
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:?]

Looking at rest of the logs, the core issue is getting notifications for gw mac port creation before neutron router is created.

2018-05-03T13:22:28,967 | WARN  | org.opendaylight.yang.gen.v1.urn.opendaylight.neutron.ports.rev150712.ports.attributes.ports.Port_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | NeutronPortChangeListener        | 362 - org.opendaylight.netvirt.neutronvpn-impl - 0.6.0 | *No router found for router GW port c69fd64f-4c6a-43a6-ae21-c72f11d29a72 for router *29ec1425-72df-4b11-9327-527cf41d58a9


2018-05-03T13:22:29,263 | INFO  | org.opendaylight.yang.gen.v1.urn.opendaylight.neutron.l3.rev150712.routers.attributes.routers.Router_AsyncClusteredDataTreeChangeListenerBase-DataTreeChangeHandler-0 | NeutronRouterChangeListener      | 356 - org.opendaylight.netvirt.ipv6service-impl - 0.6.0 | Add Router notification handler is invoked Uuid [_value=29ec1425-72df-4b11-9327-527cf41d58a9].

2018-05-03T13:22:29,283 | INFO  | ForkJoinPool-2-worker-34 | VpnInstanceListener              | 370 - org.opendaylight.netvirt.vpnmanager-impl - 0.6.0| VPN-ADD: addVpnInstance: VPN Id 106159 generated for VpnInstanceName 29ec1425-72df-4b11-9327-527cf41d58a9

Router router = neutronvpnUtils.getNeutronRouter(routerId);
if (router == null) {
LOG.warn("No router found for router GW port {} for router {}", routerGwPort.getUuid().getValue(),
routerId.getValue());
return;
}
gwMacResolver.sendArpRequestsToExtGateways(router);

setExternalGwMac(routerGwPort, routerId);
}

private void setExternalGwMac(Port routerGwPort, Uuid routerId) {
// During full-sync networking-odl syncs routers before ports. As such,
// the MAC of the router's gw port is not available to be set when the
// router is written. We catch that here.

This means it is not an issue of notifications out of order, but getting gw port notification before router was created in mdsal. The most likely reason for this is n-odl sending out of order, but it shouldn't result in unhandled exception and thread termination.



 Comments   
Comment by Vishal Thapar [ 19/Jun/18 ]

Requires https://git.opendaylight.org/gerrit/73136 to work.

Comment by Sam Hague [ 19/Jun/18 ]

thapar are you saying the bug is fixed with the genius patch or that your patches are still required and will now work?

Comment by Vishal Thapar [ 19/Jun/18 ]

shague My netvirt patch is still required and will work after genius one. My patch was using genius API that was broken so patch was basically not doing anything, except for the exception catch in natservice code. Genius patch fixes the broken API.

Generated at Wed Feb 07 20:23:36 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.