[NEUTRON-159] Sporadic NeutronNetworkJAXBTest & NeutronFirewallJAXBTest failures Created: 10/Apr/18  Updated: 10/Oct/18  Resolved: 10/Oct/18

Status: Resolved
Project: neutron
Component/s: neutron-spi
Affects Version/s: Oxygen, Fluorine
Fix Version/s: Oxygen-SR4, Fluorine-SR1, Neon

Type: Bug Priority: Medium
Reporter: Michael Vorburger Assignee: Michael Vorburger
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Blocks
is blocked by NEUTRON-160 Bump version of EclipseLink Moxy JAXB... Resolved

 Description   

as raised on https://lists.opendaylight.org/pipermail/neutron-dev/2018-March/001633.html, and seen again e.g. in https://jenkins.opendaylight.org/releng/job/neutron-maven-verify-fluorine-mvn33-openjdk8/25/console for (unrelated) minor change https://git.opendaylight.org/gerrit/#/c/70354/, we're occassionally but regularly enough hitting these weird but recurring NeutronNetworkJAXBTest & NeutronFirewallJAXBTest failures:

Failed tests: 
  NeutronFirewallJAXBTest.test_NeutronFirewallPolicy_JAXB:61 NeutronFirewallPolicy JAXB Test 2: Testing tenant_id failed expected:<aa902936679e4ea29bfe1158e3450a13> but was:<null>
  NeutronFirewallJAXBTest.test_NeutronFirewall_JAXB:31 NeutronFirewall JAXB Test 2: Testing tenant_id failed expected:<aa902936679e4ea29bfe1158e3450a13> but was:<null>
  NeutronFloatingIpJAXBTest.test_NeutronFloatingIp_JAXB:34 NeutronFloatingIp JAXB Test 2: Testing tenant_id failed expected:<4969c491a3c74ee4af974e6d800c62de> but was:<null>
  NeutronLoadBalancerHealthMonitorJAXBTest.test_NeutronLoadBalancerHealthMonitor_JAXB:58 NeutronLoadBalancerHealthMonitor JAXB Test 10: Testing tenant_id failed expected:<00045a7b-796b-4f26-9cf9-9e82d248fda7> but was:<null>
  NeutronLoadBalancerJAXBTest.test_NeutronLoadBalancer_JAXB:48 NeutronLoadBalancer JAXB Test 8: Testing tenant_id failed expected:<4969c491a3c74ee4af974e6d800c62de> but was:<null>
  NeutronLoadBalancerListenerJAXBTest.test_NeutronLoadBalancerListener_JAXB:62 NeutronLoadBalancerListener JAXB Test 9: Testing tenant_id failed expected:<11145a7b-796b-4f26-9cf9-9e82d248fda7> but was:<null>
  NeutronLoadBalancerPoolJAXBTest.test_NeutronLoadBalancerPool_JAXB:47 NeutronLoadBalancerPool JAXB Test 7: Testing Tenant_id failed expected:<1a3e005cf9ce40308c900bcb08e5320c> but was:<null>
  NeutronLoadBalancerPoolMemberJAXBTest.test_NeutronLoadBalancerPoolMember_JAXB:47 NeutronLoadBalancerPoolMember JAXB Test 7: Testing  tenant_id  failed expected:<00045a7b-796b-4f26-9cf9-9e82d248fda7> but was:<null>
  NeutronMeteringLabelJAXBTest.test_NeutronMeteringLabel_JAXB:34 NeutronMeteringLabel JAXB Test 4: Testing tenant_id failed expected:<9bacb3c5d39d41a79512987f338cf177> but was:<null>
  NeutronNetworkJAXBTest.test_NeutronNetwork_MultipleProvider_JAXB:79 NeutronNetwork JAXB Test 2: Testing tenant_id failed expected:<9bacb3c5d39d41a79512987f338cf177> but was:<null>
  NeutronNetworkJAXBTest.test_NeutronNetwork_SingleProvider_JAXB:35 NeutronNetwork JAXB Test 2: Testing tenant_id failed expected:<9bacb3c5d39d41a79512987f338cf177> but was:<null>
  NeutronNetworkQosJAXBTest.test_NeutronNetworkQos_JAXB:33 NeutronNetwork JAXB Test 2: Testing tenant_id failed expected:<9bacb3c5d39d41a79512987f338cf177> but was:<null>
  NeutronPortJAXBTest.test_NeutronPort_JAXB:39 NeutronPort JAXB Test 2: Testing tenant_id failed expected:<9bacb3c5d39d41a79512987f338cf177> but was:<null>
  NeutronPortQosJAXBTest.test_PortQosEnabled_JAXB:42 NeutronPort JAXB Test 2: Testing tenant_id failed expected:<9bacb3c5d39d41a79512987f338cf177> but was:<null>
  NeutronPortSecurityJAXBTest.test_NeutronPortSecurityDefault_JAXB:77->test_PortSecurityEnabled_JAXB:87 NeutronPort JAXB Test 2: Testing tenant_id failed expected:<9bacb3c5d39d41a79512987f338cf177> but was:<null>
  NeutronPortSecurityJAXBTest.test_NeutronPortSecurityDisabled_JAXB:68->test_PortSecurityEnabled_JAXB:87 NeutronPort JAXB Test 2: Testing tenant_id failed expected:<9bacb3c5d39d41a79512987f338cf177> but was:<null>
  NeutronPortSecurityJAXBTest.test_NeutronPortSecurityEnabled_JAXB:63->test_PortSecurityEnabled_JAXB:87 NeutronPort JAXB Test 2: Testing tenant_id failed expected:<9bacb3c5d39d41a79512987f338cf177> but was:<null>
  NeutronQosJAXBTest.test_NeutronQosPolicy_JAXB:40 NeutronQosPolicy JAXB Test 2: Testing tenant_id failed expected:<aa902936679e4ea29bfe1158e3450a13> but was:<null>
  NeutronRouterJAXBTest.test_NeutronRouter_JAXB:45 NeutronFloatingIp JAXB Test 5: Testing tenant_id failed expected:<aa902936679e4ea29bfe1158e3450a13> but was:<null>
  NeutronSFCFlowClassifierJAXBTest.test_NeutronSFCFlowClassifier_JAXB:39 NeutronSFCFlowClassifier JAXB Test 2: Testing tenant_id failed expected:<4969c491a3c74ee4af974e6d800c62de> but was:<null>
  NeutronSFCPortChainJAXBTest.test_NeutronSFCPortChain_JAXB:38 NeutronSFCPortChain JAXB Test 2: Testing tenant_id failed expected:<4969c491a3c74ee4af974e6d800c62de> but was:<null>
  NeutronSFCPortPairGroupJAXBTest.test_NeutronSFCPortPairGroup_JAXB:32 NeutronSFCPortPairGroup JAXB Test 2: Testing tenant_id failed expected:<4969c491a3c74ee4af974e6d800c62de> but was:<null>
  NeutronSFCPortPairJAXBTest.test_NeutronSFCPortPair_JAXB:35 NeutronSFCPortPair JAXB Test 2: Testing tenant_id failed expected:<4969c491a3c74ee4af974e6d800c62de> but was:<null>
  NeutronSecurityGroupJAXBTest.test_NeutronSecurityGroup_JAXB:39 NeutronSecurityGroup JAXB Test 4: Testing port range min failed expected:<b4f50856753b4dc6afee5fa6b9b6c550> but was:<null>
  NeutronSecurityRuleJAXBTest.test_NeutronSecurityRule_JAXB:74 NeutronSecurityRule JAXB Test 10: Testing tenant id failed expected:<e4f50856753b4dc6afee5fa6b9b6c550> but was:<null>
  NeutronSubnetJAXBTest.test_NeutronSubnet_JAXB:47 NeutronSubnet JAXB Test 2: Testing tenant_id failed expected:<379ffe2b9cda498d9e17b319733ec889> but was:<null>
  NeutronTapFlowJAXBTest.test_NeutronTapFlow_JAXB:33 NeutronTapFlow JAXB Test 2: Testing tenant_id failed expected:<aa902936679e4ea29bfe1158e3450a13> but was:<null>
  NeutronTapServiceJAXBTest.test_NeutronTapService_JAXB:31 NeutronTapService JAXB Test 2: Testing tenant_id failed expected:<aa902936679e4ea29bfe1158e3450a13> but was:<null>
  NeutronTrunkJAXBTest.test_NeutronTrunk_JAXB:43 NeutronTrunk JAXB Test 5: Testing tenant_id failed expected:<cc3641789c8a4304abaa841c64f638d9> but was:<null>
  NeutronVpnIkePolicyJAXBTest.test_NeutronVpnIkePolicy_JAXB:33 NeutronVpnIkePolicy JAXB Test 2: Testing tenant id failed expected:<ccb81365fe36411a9011e90491fe1330> but was:<null>
  NeutronVpnIpSecPolicyJAXBTest.test_NeutronVpnIPSecPolicy_JAXB:34 NeutronVpnIpSecPolicy JAXB Test 2: Testing tenant id failed expected:<ccb81365fe36411a9011e90491fe1330> but was:<null>
  NeutronVpnIpSecSiteConnectionJAXBTest.test_NeutronVpnIPSecSiteConnection_JAXB:39 NeutronVpnIpSecSiteConnection JAXB Test 2: Testing tenant id failed expected:<ccb81365fe36411a9011e90491fe1330> but was:<null>
  NeutronVpnServiceJAXBTest.test_NeutronVPNService_JAXB:45 NeutronVpnService JAXB Test 6: Testing Tenant Id failed expected:<ccb81365fe36411a9011e90491fe1330> but was:<null>
Tests in error: 
  NeutronFirewallJAXBTest.test_NeutronFirewallRule_JAXB:89 NullPointer

Tests run: 62, Failures: 33, Errors: 1, Skipped: 0


 Comments   
Comment by Michael Vorburger [ 10/Apr/18 ]

I've run the NeutronNetworkJAXBTest about 20'000 times locally (using the org.opendaylight.infrautils.testutils.RunUntilFailureRule) but cannot reproduce this locally. How can this simple test fail only on Jenkins, only every now and then?

This is typically indicative of a concurrency timing issue, but I don't see how this could apply here, this is REALLY weird... because the failing tests are really quite simple, just some trivial looking JAX JSON unmarshalling and assert; so I've had a closer look, more because I'm intrigugined by the mystery, although I guess in theory this could be a real problem at runtime in production as well:

The 33 failures in NeutronNetworkJAXBTest are because a NeutronObject getTenantID() is SOMETIMES null - however the asserts on getID() which failing all tests do just before passes. What's so special about this tenantID? It has an if isEmpty() return null check in its getter... are there some known concurrency issues with JAXB where this could cause problems?? I'm going to re-order the asserts in the tests to put tenantID last, and see if ALL other properties did get unmarshalled correctly, whenever this hits us next...

The NeutronFirewallJAXBTest.test_NeutronFirewallRule_JAXB:89 failure is an NPE where the JaxbTestHelper.jaxbUnmarshall returns null; so that's a little bit different (entire object, not just 1 property).

The JaxbTestHelper has nothing obciously wrong, that I can see. May be close the reader? Cache the JAXBContext?

Comment by Michael Vorburger [ 10/Apr/18 ]

> re-order the asserts in the tests to put tenantID last, and see if ALL
> other properties did get unmarshalled correctly, whenever this hits us next...

E.g. in both NeutronLoadBalancerHealthMonitorJAXBTest, NeutronLoadBalancerPoolMemberJAXBTest, NeutronSFCPortPairGroupJAXBTest and NeutronMeteringLabelJAXBTest and more this coindentially was already done like this, so all other properties are asserted on just fine there, only getTenantID() then returns null; this supports the theory that there is some weird issue related specifically to the tenant_id getter... hm. https://git.opendaylight.org/gerrit/#/c/70706/ will eventually confirm this for good, but I would say there is a high likelyhood that that custom getter is somehow occassionally causing havoc.

Comment by Michael Vorburger [ 10/Apr/18 ]

On the off chance (probably unlikely, but you never know) that this is some wacky sporadic bug in the EclipseLink Moxy JAXB implementation we use for the JSON processing, let's try to bump it to the lastest in NEUTRON-160, and see if we are lucky that it will help for this to never hit us anymore.

Comment by Michael Vorburger [ 17/Apr/18 ]

This happened again today, but only on 1 of 5 of my Neutron changes that were built successfully today.

But c/70717 with the Moxy version bump is not yet merged; I need that to go in before looking any further.

Comment by Michael Vorburger [ 05/Jul/18 ]

This recently happened again on stable oxygen, twice on autorelease-release-oxygen/343 and autorelease-release-oxygen/339. I had A Closer Look look through all autorelease-release-fluorine for the last 1 month, and it's a datapoint worthwhile noting that it hasn't happened on master anymore.

So while strictly speaking this is not conclusive proof of course, it supports the theory (or least doesn't contradict it) that my earlier NEUTRON-160 work may actually have fixed this; I'll therefore attempt to back-port it from master to stable/oxygen.

Comment by Michael Vorburger [ 05/Jul/18 ]

Closing this issue now, as it's not been seen on master Fluorine in a while (see above), and hoping that NEUTRON-160 via c/73782 on Oxygen does the same trick there. We'll re-open if that wasn't it.

Comment by Michael Vorburger [ 03/Sep/18 ]

Seen again today on stable/oxygen on https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/autorelease-release-oxygen/408/ 

Comment by Michael Vorburger [ 03/Sep/18 ]

c/75508 does another bump of Bump EclipseLink Moxy JAXB impl 2.7.1 → 2.7.3 - if we are exceptionally lucky, that fixes some issue. But what we (someone) should really do here is try to run these occassionally failing tests under infrautils'  RunUntilFailureRule and see if it can reproduced running over night - then debug.

Comment by Michael Vorburger [ 10/Sep/18 ]

Seen again today on neon on https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/autorelease-release-neon/34/ .. I just spent 10' having another closer look. E.g. NeutronTrunkJAXBTest - that getTenantID() really cannot be null, I'm very puzzled.

Comment by Michael Vorburger [ 10/Sep/18 ]

Looked again more into this PITA, I still don't really understand how it could happen, but I have just pushed three changes which may help with this tenant_id empty/null business (which, historically, is due to https://bugs.opendaylight.org/show_bug.cgi?id=4775 and its old https://git.opendaylight.org/gerrit/#/c/31324 and https://git.opendaylight.org/gerrit/#/c/31361/) ... let's try to:

  1. c/75926 put that @XmlElement on the getter & setter instead of the field
  2. c/75927 use an EmptyStringAsNullAdapter (XmlAdapter) instead of if in the getter
  3. c/75928 make tenantID a private field - and cross fingers that (one of these) helps here...

If we don't see it on master for say 2 weeks, then cherry-pick to stable/fluorine and stable/oxygen.

Comment by Michael Vorburger [ 02/Oct/18 ]

> If we don't see it on master for say 2 weeks, then cherry-pick to stable/fluorine and stable/oxygen.

done today

Generated at Wed Feb 07 20:25:43 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.