[NETVIRT-1132] NPE: StateTunnelList_ADTCL handleTunnelEventForDpn: Unable to handle the tunnel event for srcDpnId 110780754426209 srcTepIp 10.30.170.87 remoteDpnId 106567621600656 destTepIp 10.30.170.102 handleTunnelEventForDPN(TunnelInterfaceStateListener.java:340) Created: 06/Mar/18  Updated: 13/Mar/18  Resolved: 13/Mar/18

Status: Resolved
Project: netvirt
Component/s: General
Affects Version/s: Oxygen
Fix Version/s: Oxygen

Type: Bug Priority: Medium
Reporter: Sam Hague Assignee: Victor Pickard
Resolution: Done Votes: 0
Labels: csit:exception, patch_merged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File odl_karaf.log.tar.xz    

 Description   

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-1node-openstack-queens-upstream-stateful-oxygen/191/odl_1/odl1_karaf.log.gz

The exception happens in the add and update paths. Add path is here and later is the update path:

2018-03-06T05:23:47,604 | INFO  | org.opendaylight.yang.gen.v1.urn.opendaylight.genius.itm.op.rev160406.tunnels_state.StateTunnelList_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | TunnelInterfaceStateListener     | 367 - org.opendaylight.netvirt.vpnmanager-impl - 0.6.0.SNAPSHOT | handleTunnelEventForDPN: Handle tunnel event for srcDpn 110780754426209 SrcTepIp 10.30.170.87 DestTepIp 10.30.170.102 
2018-03-06T05:23:47,604 | INFO  | org.opendaylight.yang.gen.v1.urn.opendaylight.genius.itm.op.rev160406.tunnels_state.StateTunnelList_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | TunnelInterfaceStateListener     | 367 - org.opendaylight.netvirt.vpnmanager-impl - 0.6.0.SNAPSHOT | handleTunnelEventForDPN: Tunnel ADD event received for Dpn 110780754426209 VTEP Ip 10.30.170.87 destTepIp
2018-03-06T05:23:47,604 | INFO  | ForkJoinPool-1-worker-1 | ElanTunnelInterfaceStateListener | 348 - org.opendaylight.netvirt.elanmanager-impl - 0.6.0.SNAPSHOT | Handling tunnel state event for srcDpId 110780754426209 and dstDpId 106567621600656 
2018-03-06T05:23:47,605 | ERROR | org.opendaylight.yang.gen.v1.urn.opendaylight.genius.itm.op.rev160406.tunnels_state.StateTunnelList_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | TunnelInterfaceStateListener     | 367 - org.opendaylight.netvirt.vpnmanager-impl - 0.6.0.SNAPSHOT | handleTunnelEventForDpn: Unable to handle the tunnel event for srcDpnId 110780754426209 srcTepIp 10.30.170.87 remoteDpnId 106567621600656 destTepIp 10.30.170.102
java.lang.NullPointerException: null
	at org.opendaylight.netvirt.vpnmanager.TunnelInterfaceStateListener.handleTunnelEventForDPN(TunnelInterfaceStateListener.java:340) [367:org.opendaylight.netvirt.vpnmanager-impl:0.6.0.SNAPSHOT]
	at org.opendaylight.netvirt.vpnmanager.TunnelInterfaceStateListener.add(TunnelInterfaceStateListener.java:217) [367:org.opendaylight.netvirt.vpnmanager-impl:0.6.0.SNAPSHOT]
	at org.opendaylight.netvirt.vpnmanager.TunnelInterfaceStateListener.add(TunnelInterfaceStateListener.java:76) [367:org.opendaylight.netvirt.vpnmanager-impl:0.6.0.SNAPSHOT]
	at org.opendaylight.genius.datastoreutils.AsyncDataTreeChangeListenerBase$DataTreeChangeHandler.run(AsyncDataTreeChangeListenerBase.java:167) [260:org.opendaylight.genius.mdsalutil-api:0.4.0.SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:?]
	at java.lang.Thread.run(Thread.java:748) [?:?]

This also happens in the update path:

2018-03-06T05:23:55,042 | INFO  | org.opendaylight.yang.gen.v1.urn.opendaylight.genius.itm.op.rev160406.tunnels_state.StateTunnelList_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | TunnelInterfaceStateListener     | 367 - org.opendaylight.netvirt.vpnmanager-impl - 0.6.0.SNAPSHOT | handleTunnelEventForDPN: Tunnel ADD event received for Dpn 110780754426209 VTEP Ip 10.30.170.87 destTepIp
2018-03-06T05:23:55,043 | ERROR | org.opendaylight.yang.gen.v1.urn.opendaylight.genius.itm.op.rev160406.tunnels_state.StateTunnelList_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | TunnelInterfaceStateListener     | 367 - org.opendaylight.netvirt.vpnmanager-impl - 0.6.0.SNAPSHOT | handleTunnelEventForDpn: Unable to handle the tunnel event for srcDpnId 110780754426209 srcTepIp 10.30.170.87 remoteDpnId 106567621600656 destTepIp 10.30.170.102
java.lang.NullPointerException: null
	at org.opendaylight.netvirt.vpnmanager.TunnelInterfaceStateListener.handleTunnelEventForDPN(TunnelInterfaceStateListener.java:340) [367:org.opendaylight.netvirt.vpnmanager-impl:0.6.0.SNAPSHOT]
	at org.opendaylight.netvirt.vpnmanager.TunnelInterfaceStateListener.update(TunnelInterfaceStateListener.java:169) [367:org.opendaylight.netvirt.vpnmanager-impl:0.6.0.SNAPSHOT]
	at org.opendaylight.netvirt.vpnmanager.TunnelInterfaceStateListener.update(TunnelInterfaceStateListener.java:76) [367:org.opendaylight.netvirt.vpnmanager-impl:0.6.0.SNAPSHOT]
	at org.opendaylight.genius.datastoreutils.AsyncDataTreeChangeListenerBase$DataTreeChangeHandler.run(AsyncDataTreeChangeListenerBase.java:172) [260:org.opendaylight.genius.mdsalutil-api:0.4.0.SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:?]
	at java.lang.Thread.run(Thread.java:748) [?:?]


 Comments   
Comment by Victor Pickard [ 08/Mar/18 ]

OK, here are my observations from looking into this exception. Sam, thanks for forwarding the email thread about neutron creating initial networks being enabled. That was a good clue to help narrow down this issue.
 
Here is the sandbox job [1] I ran with extra debug to get the interface name that was not found in config d/s that was causing the exception.
 
 
*+Why is the interface ad7be255-1fc2-4bab-93e1-9641504c27fb not found in config d/s? 
+* 
TunnelInterfaceStateListener::handleTunnelEventForDPN makes a call to genius InterfaceManager::getDpnInterfaceList(), which returns interfaces from OPERATIONAL d/s, NOT config d/s. 

It appears that this port was never added to config d/s..... perhaps because neutron create initial networks was enabled?? 

So, we get a list of interfaces from genius from oper d/s, and then look for the interface (netvirt) in config d/s, and it is not there. This doesn't seem correct, should genius be returning interfaces from config d/s? I suppose a more general question, how can a port end up in oper d/s and not be in config d/s? Is it because neutron created the port before ODL was up, and did not receive the neutron port create event? Probably so..... 
 
 
Debugging details
 

Interface that is not found in config d/s ad7be255-1fc2-4bab-93e1-9641504c27fb 
2018-03-07T20:34:26,183 | INFO  | org.opendaylight.yang.gen.v1.urn.opendaylight.genius.itm.op.rev160406.tunnels_state.StateTunnelList_AsyncDataTreeChangeListenerBase-Data
TreeChangeHandler-0 | TunnelInterfaceStateListener     | 364 - org.opendaylight.netvirt.vpnmanager-impl - 0.7.0.SNAPSHOT | handleTunnelEventForDPN: Processing Interface n
ame ad7be255-1fc2-4bab-93e1-9641504c27fb, Interface Interfaces [_interfaceName=ad7be255-1fc2-4bab-93e1-9641504c27fb, _interfaceType=class org.opendaylight.yang.gen.v1.urn
.ietf.params.xml.ns.yang.iana._if.type.rev140508.L2vlan, augmentation=[]]

2018-03-07T20:34:26,183 | ERROR | org.opendaylight.yang.gen.v1.urn.opendaylight.genius.itm.op.rev160406.tunnels_state.StateTunnelList_AsyncDataTreeChangeListenerBase-Data
TreeChangeHandler-0 | TunnelInterfaceStateListener     | 364 - org.opendaylight.netvirt.vpnmanager-impl - 0.7.0.SNAPSHOT | handleTunnelEventForDpn: Unable to handle the t
unnel event for srcDpnId 277980448198840 srcTepIp 10.30.170.83 remoteDpnId 154421049938961 destTepIp 10.30.170.96 
java.lang.NullPointerException: null 
       at org.opendaylight.netvirt.vpnmanager.TunnelInterfaceStateListener.handleTunnelEventForDPN(TunnelInterfaceStateListener.java:343) [364:org.opendaylight.netvirt.v
pnmanager-impl:0.7.0.SNAPSHOT] 
       at org.opendaylight.netvirt.vpnmanager.TunnelInterfaceStateListener.add(TunnelInterfaceStateListener.java:219) [364:org.opendaylight.netvirt.vpnmanager-impl:0.7.0
.SNAPSHOT] 
       at org.opendaylight.netvirt.vpnmanager.TunnelInterfaceStateListener.add(TunnelInterfaceStateListener.java:76) [364:org.opendaylight.netvirt.vpnmanager-impl:0.7.0.
SNAPSHOT] 
       at org.opendaylight.genius.datastoreutils.AsyncDataTreeChangeListenerBase$DataTreeChangeHandler.run(AsyncDataTreeChangeListenerBase.java:167) [258:org.opendayligh
t.genius.mdsalutil-api:0.5.0.SNAPSHOT] 
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:?] 
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:?] 
       at java.lang.Thread.run(Thread.java:748) [?:?] 
 

1st robot test... exception comes out even before 1st test starts... is this related to neutron creating initial networks? 
2018-03-07T20:40:17,258 | INFO  | pipe-log:log "ROBOT MESSAGE: Starting suite /w/workspace/netvirt-csit-1node-openstack-queens-vic-upstream-stateful-fluorine/test/csit/su
ites/openstack/connectivity/l2.robot" | core                             | 123 - org.apache.karaf.log.core - 4.1.3 | ROBOT MESSAGE: Starting suite /w/workspace/netvirt-cs
it-1node-openstack-queens-vic-upstream-stateful-fluorine/test/csit/suites/openstack/connectivity/l2.robot 
2018-03-07T20:40:27,363 | INFO  | pipe-log:log "ROBOT MESSAGE: Starting suite /w/workspace/netvirt-csit-1node-openstack-queens-vic-upstream-stateful-fluorine/test/csit/su
ites/openstack/connectivity/l2.robot" | core                             | 123 - org.apache.karaf.log.core - 4.1.3 | ROBOT MESSAGE: Starting test Create VLAN Network net_

When is the interface ad7be255-1fc2-4bab-93e1-9641504c27fb created? 

2018-03-07T20:34:05,041 | INFO  | ForkJoinPool-1-worker-6 | NeutronPortChangeListener        | 356 - org.opendaylight.netvirt.neutronvpn-impl - 0.7.0.SNAPSHOT | Of-port-interface creation for port ad7be255-1fc2-4bab-93e1-9641504c27fb 

Port created a good 6+ minutes before any robot csit tests start... 

2018-03-07T20:34:06,719 | INFO  | org.opendaylight.yang.gen.v1.urn.opendaylight.neutron.ports.rev150712.ports.attributes.ports.Port_AsyncClusteredDataTreeChangeListenerBa
se-DataTreeChangeHandler-0 | NeutronPortChangeListener        | 350 - org.opendaylight.netvirt.ipv6service-impl - 0.7.0.SNAPSHOT | IPv6: addInterfaceInfo is invoked for ahost interface Port{getAllowedAddressPairs=[], getDeviceId=dhcpe5510a36-5d54-52f9-9732-333c3a36fa80-0e1251ed-f2a1-42b9-940e-1c0f5341e75d, getDeviceOwner=network:dhcp, getExtraDhcpOpts=[], getFixedIps=[FixedIps{getIpAddress=IpAddress [_ipv4Address=Ipv4Address [_value=10.0.0.2]], getSubnetId=Uuid [_value=b8b43068-d1cd-49b6-88f0-9c6db910d37
5], augmentations={}}, FixedIps{getIpAddress=IpAddress [_ipv6Address=Ipv6Address [_value=fd12:5ac9:f253:0:f816:3eff:fe6e:aa7d]], getSubnetId=Uuid [_value=c2b3a176-3742-45a5-ba3d-7302d0ddf2b7], augmentations={}}], getMacAddress=MacAddress [_value=fa:16:3e:6e:aa:7d], getName=, getNetworkId=Uuid [_value=0e1251ed-f2a1-42b9-940e-1c0f5341e75d],
getRevisionNumber=7, getSecurityGroups=[], getTenantId=Uuid [_value=b76b365c-9351-482f-9f5c-beb8570e29a4], getUuid=Uuid [_value=ad7be255-1fc2-4bab-93e1-9641504c27fb], isAdminStateUp=true, augmentations={interfaceorg.opendaylight.yang.gen.v1.urn.opendaylight.neutron.portsecurity.rev150712.PortSecurityExtension=PortSecurityExtension

{isPor tSecurityEnabled=false}

,interfaceorg.opendaylight.yang.gen.v1.urn.opendaylight.neutron.binding.rev150712.PortBindingExtension=PortBindingExtension{getHostId=sandbox-31488-4-1-devstack-pike-0.vexxhost.net, getProfile={}, getVifDetails=[VifDetails\{getDetailsKey=support_vhost_user, getValue=false, augmentations={}}, VifDetails\{getDetailsKey=uuid, getValue=46f51f05-7822-4af1-a99d-03e0f03cdd59, augmentations={}}, VifDetails\{getDetailsKey=has_datapath_type_netdev, getValue=false, augmentations={}}], getVifTyp
e=ovs, getVnicType=normal}}}, fixedIp: FixedIps{getIpAddress=IpAddress [_ipv6Address=Ipv6Address [_value=fd12:5ac9:f253:0:f816:3eff:fe6e:aa7d]], getSubnetId=Uuid [_value=c2b3a176-3742-45a5-ba3d-7302d0ddf2b7], augmentations={}}

Robot L2 suite runs. Openstack Suite setup cleans up, and removes the port 

https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/netvirt-csit-1node-openstack-queens-vic-upstream-stateful-fluorine/4/robot-plugin/log_01_l2.html.gz#s1-k2-k1-k
7-k4 

KEYWORD BuiltIn . Log ${list} 
Documentation: 
Logs the given message with the given level. 

Start / End / Elapsed:  20180307 15:47:06.683 / 20180307 15:47:06.684 / 00:00:00.001 
15:47:06.684    INFO    [

{u'Status': u'ACTIVE', u'Fixed IP Addresses': u"ip_address='10.0.0.2', subnet_id='b8b43068-d1cd-49b6-88f0-9c6db910d375'\nip_address='fd12:5ac9:f2 53:0:f816:3eff:fe6e:aa7d', subnet_id='c2b3a176-3742-45a5-ba3d-7302d0ddf2b7'", u'ID': u'ad7be255-1fc2-4bab-93e1-9641504c27fb', u'MAC Address': u'fa:16:3e:6e:aa:7d', u'Name ': u''}

Start / End / Elapsed:  20180307 15:47:06.708 / 20180307 15:47:08.388 / 00:00:01.680 
15:47:06.710    INFO    Running command 'openstack port delete ad7be255-1fc2-4bab-93e1-9641504c27fb 2>&1'.
 
 
[1] https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/netvirt-csit-1node-openstack-queens-vic-upstream-stateful-fluorine/4/odl_1/odl1_karaf.log.gz
 

Comment by Victor Pickard [ 12/Mar/18 ]

OK, the previous analysis was incorrect. Here is the latest update, thanks for helping with this Michael/Stephen/Sam!

 

 
Victor,
 
as discussed today on #opendaylight-netvirt, and with full due credit to Sam and Stephen, "here's what happened", as Mr. Monk would say:
 
 
The NPE at line 343 of netvirt.vpnmanager.TunnelInterfaceStateListener (that's WITH your https://git.opendaylight.org/gerrit/#/c/69222/2/vpnmanager/impl/src/main/java/org/opendaylight/netvirt/vpnmanager/TunnelInterfaceStateListener.java; on master that is 341, this caused me a bit of confusion initially) in this line:
 
                if (vpnInterface != null && !vpnInterface.isScheduledForRemove()) {
 
is caused by Boolean isScheduledForRemove() - note the object Boolean instead of the primitive boolean. It's because of what's called "auto boxing" in Java, and not that obvious to spot... basically a boolean can be (only) true or false but a Boolean can that OR null, and in an if (theBoolean) you can get an NPE wherever it has to get "coerced" to a boolean (like in an if, but e.g. !vpnInterface.isScheduledForRemove() etc. could cause it as well).
 
A trivial fix could be to just change that to this, as suggested by Stephen:
                 if (vpnInterface != null && !Boolean.TRUE.equals(vpnInterface.isScheduledForRemove())) {
 
which, just for illustration basicaly just does this:
 
                if (vpnInterface != null && vpnInterface.isScheduledForRemove() != null && !vpnInterface.isScheduledForRemove()) {
 
BUT I would question whether it's expected to get null in there - whever that vpnInterface was created, is there a setScheduledForRemove() call missing on a VpnInterfaceBuilder somewhere? in the YANG model for this, should the scheduledForRemove perhaps default to false? Or we would you’d need to add “mandatory” to the YANG model?
 
Another angle would be to do something on the code quality metrics side of things to see if we could fail on unchecked possible Boolean usage as boolean. I'm not sure if such a check already exists, but will have a look.
 

Comment by Victor Pickard [ 12/Mar/18 ]

Michael/Stephen/Sam,
Thanks for taking a look at this NPE and providing the details, especially the explanation about "auto boxing", I clearly didn't see it in the code, your analysis is much appreciated.

As Sam noted in #opendaylight-netvirt, this vpnInterface.isScheduledForRemove() came in a recent patch, https://git.opendaylight.org/gerrit/#/c/68151/.

I have pushed a patch [0] that makes the call to setScheduledForRemove() on the VpnInterface when it is created for all references, and also, added the extra null check as suggested by Stephen in TunnelInterfaceStateListener.java.

I'll add Kiran and Phillipe to the review.

[0] https://git.opendaylight.org/gerrit/69398

Comment by Sam Hague [ 13/Mar/18 ]

https://git.opendaylight.org/gerrit/69398

Generated at Wed Feb 07 20:23:18 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.