[NETVIRT-1249] Thread terminated due to an uncaught exception Created: 08/May/18 Updated: 04/Jul/18 Resolved: 04/Jul/18 |
|
| Status: | Resolved |
| Project: | netvirt |
| Component/s: | General |
| Affects Version/s: | Fluorine |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Medium |
| Reporter: | Vishal Thapar | Assignee: | Vishal Thapar |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
When running following test case in downstream scale/perf setup we see that some VMs remain unpingable even after 300 seconds and we see a lot of the following exceptions in karaf logs 2018-04-13T22:57:05,411 | ERROR | org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.napt.switches.RouterToNaptSwitch_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | AsyncDataTreeChangeListenerBase | 263 - org.opendaylight.genius.mdsalutil-api - 0.4.0 | Thread terminated due to uncaught exception: org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.napt.switches.RouterToNaptSwitch_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 Core issue is getting gw port notification before router was created in mdsal as confirmed by following log entry: 2018-05-03T13:22:28,967 | WARN | org.opendaylight.yang.gen.v1.urn.opendaylight.neutron.ports.rev150712.ports.attributes.ports.Port_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | NeutronPortChangeListener | 362 - org.opendaylight.netvirt.neutronvpn-impl - 0.6.0 | No router found for router GW port c69fd64f-4c6a-43a6-ae21-c72f11d29a72 for router 29ec1425-72df-4b11-9327-527cf41d58a9 {{ Router router = neutronvpnUtils.getNeutronRouter(routerId); setExternalGwMac(routerGwPort, routerId); private void setExternalGwMac(Port routerGwPort, Uuid routerId) { The most likely reason for this is n-odl sending out of order and should be addressed in n-odl/NN. But even then it shouldn't lead to unhandled exception and thread termination. Since it is a race condition, a simple retry might be enough to avoid this problem. |
||
| Priority: | Normal |
| Description |
|
When running downstream scale/perf testing we see that some VMs remain unpingable even after 300 seconds and we see a lot of the exceptions in karaf logs when running following tests and ping fails Create a network Exceptions 2018-04-13T22:57:05,411 | ERROR | org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.napt.switches.RouterToNaptSwitch_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | AsyncDataTreeChangeListenerBase | 263 - org.opendaylight.genius.mdsalutil-api - 0.4.0.redhat-5 | Thread terminated due to uncaught exception: org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.napt.switches.RouterToNaptSwitch_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0
java.lang.IllegalArgumentException: Invalid range: -1, expected: [[0..18446744073709551615]].
at org.opendaylight.yang.gen.v1.urn.opendaylight.model.match.types.rev131026.match.TunnelBuilder.checkTunnelIdRange(TunnelBuilder.java:76)
at org.opendaylight.yang.gen.v1.urn.opendaylight.model.match.types.rev131026.match.TunnelBuilder.setTunnelId(TunnelBuilder.java:81)
at org.opendaylight.genius.mdsalutil.matches.MatchTunnelId.populateBuilder(MatchTunnelId.java:40)
at org.opendaylight.genius.mdsalutil.matches.MatchTunnelId.populateBuilder(MatchTunnelId.java:18)
at org.opendaylight.genius.mdsalutil.matches.MatchInfoHelper.createInnerMatchBuilder(MatchInfoHelper.java:29) [
at org.opendaylight.genius.mdsalutil.MDSALUtil.buildMatches(MDSALUtil.java:336)
at org.opendaylight.genius.mdsalutil.FlowEntity.getFlowBuilder(FlowEntity.java:87)
at org.opendaylight.genius.mdsalutil.internal.MDSALManager.writeFlowEntityInternal(MDSALManager.java:180)
at org.opendaylight.genius.mdsalutil.internal.MDSALManager.installFlowInternal(MDSALManager.java:142)
at org.opendaylight.genius.mdsalutil.internal.MDSALManager.installFlow(MDSALManager.java:609)
at Proxya2c1fd24_dd19_4d37_b669_2a61174cba99.installFlow(Unknown Source) [?:?]
at Proxy8f8e72d8_5b27_48d7_a9e4_a78c5736e1ae.installFlow(Unknown Source) [?:?]
at org.opendaylight.netvirt.natservice.internal.AbstractSnatService.syncFlow(AbstractSnatService.java:329)
at org.opendaylight.netvirt.natservice.internal.ConntrackBasedSnatService.installTerminatingServiceTblEntry(ConntrackBasedSnatService.java:148)
at org.opendaylight.netvirt.natservice.internal.ConntrackBasedSnatService.installSnatSpecificEntriesForNaptSwitch(ConntrackBasedSnatService.java:73)
at org.opendaylight.netvirt.natservice.internal.AbstractSnatService.handleSnat(AbstractSnatService.java:126)
at org.opendaylight.netvirt.natservice.internal.FlatVlanConntrackBasedSnatService.handleSnat(FlatVlanConntrackBasedSnatService.java:49)
at org.opendaylight.netvirt.natservice.internal.AbstractSnatService.handleSnatAllSwitch(AbstractSnatService.java:105)
at org.opendaylight.netvirt.natservice.internal.FlatVlanConntrackBasedSnatService.handleSnatAllSwitch(FlatVlanConntrackBasedSnatService.java:39)
at org.opendaylight.netvirt.natservice.internal.SnatServiceManagerImpl.notify(SnatServiceManagerImpl.java:57)
at org.opendaylight.netvirt.natservice.ha.SnatCentralizedSwitchChangeListener.add(SnatCentralizedSwitchChangeListener.java:87)
at org.opendaylight.netvirt.natservice.ha.SnatCentralizedSwitchChangeListener.add(SnatCentralizedSwitchChangeListener.java:32)
at org.opendaylight.genius.datastoreutils.AsyncDataTreeChangeListenerBase$DataTreeChangeHandler.run(AsyncDataTreeChangeListenerBase.java:167)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:?]
Looking at rest of the logs, the core issue is getting notifications for gw mac port creation before neutron router is created. 2018-05-03T13:22:28,967 | WARN | org.opendaylight.yang.gen.v1.urn.opendaylight.neutron.ports.rev150712.ports.attributes.ports.Port_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | NeutronPortChangeListener | 362 - org.opendaylight.netvirt.neutronvpn-impl - 0.6.0 | *No router found for router GW port c69fd64f-4c6a-43a6-ae21-c72f11d29a72 for router *29ec1425-72df-4b11-9327-527cf41d58a9 2018-05-03T13:22:29,263 | INFO | org.opendaylight.yang.gen.v1.urn.opendaylight.neutron.l3.rev150712.routers.attributes.routers.Router_AsyncClusteredDataTreeChangeListenerBase-DataTreeChangeHandler-0 | NeutronRouterChangeListener | 356 - org.opendaylight.netvirt.ipv6service-impl - 0.6.0 | Add Router notification handler is invoked Uuid [_value=29ec1425-72df-4b11-9327-527cf41d58a9]. 2018-05-03T13:22:29,283 | INFO | ForkJoinPool-2-worker-34 | VpnInstanceListener | 370 - org.opendaylight.netvirt.vpnmanager-impl - 0.6.0| VPN-ADD: addVpnInstance: VPN Id 106159 generated for VpnInstanceName 29ec1425-72df-4b11-9327-527cf41d58a9
This means it is not an issue of notifications out of order, but getting gw port notification before router was created in mdsal. The most likely reason for this is n-odl sending out of order, but it shouldn't result in unhandled exception and thread termination. |
| Comments |
| Comment by Vishal Thapar [ 19/Jun/18 ] |
|
Requires https://git.opendaylight.org/gerrit/73136 to work. |
| Comment by Sam Hague [ 19/Jun/18 ] |
|
thapar are you saying the bug is fixed with the genius patch or that your patches are still required and will now work? |
| Comment by Vishal Thapar [ 19/Jun/18 ] |
|
shague My netvirt patch is still required and will work after genius one. My patch was using genius API that was broken so patch was basically not doing anything, except for the exception catch in natservice code. Genius patch fixes the broken API. |