[YANGTOOLS-656] Occasionally, DTCL delete notification does not include "data before" Created: 30/Aug/16 Updated: 10/Apr/22 Resolved: 23/Nov/16 |
|
| Status: | Resolved |
| Project: | yangtools |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Milos Fabian | Assignee: | Robert Varga |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
| External issue ID: | 6577 |
| Description |
|
Remote peer advertise 1M routes then it is disconnected, empty topology is expected, NPE appeared instead: araf | command | 306 - org.apache.karaf.log.command - 3.0.7 | ROBOT MESSAGE: Starting test Kill_Talking_BGP_Speaker 2016-08-30 02:41:19,919 | INFO | h for user karaf | command | 306 - org.apache.karaf.log.command - 3.0.7 | ROBOT MESSAGE: Starting test Store_Results_For_Talking_BGP_Speaker ], modificationType=DELETE, childModification={}], oldMeta = null, newMeta = null}}} (transaction DOM-CHAIN-4-364481) was not completely propagated to listener org.opendaylight.bgpcep.bgp.topology.provider.Ipv4ReachabilityTopologyBuilder@245a6da5 |
| Comments |
| Comment by Milos Fabian [ 30/Aug/16 ] |
|
Appeared in ingest-mixed Boron CSIT suite |
| Comment by Milos Fabian [ 30/Aug/16 ] |
|
Similar problem observed when testing locally: |
| Comment by Milos Fabian [ 30/Aug/16 ] |
|
(In reply to Milos Fabian from comment #2) Happening with some very specific prefixes: 2016-08-30 16:36:40,242 | INFO | on-dispatcher-96 | LocRibWriter | 288 - org.opendaylight.bgpcep.bgp-rib-impl - 0.7.0.SNAPSHOT | Data before missing for: (urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)ipv4-route[ {(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)prefix=1.2.120.198/32}], ChildNode{mod = NodeModification [identifier=(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)ipv4-route[ {(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)prefix=1.2.120.198/32, (urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)path-id=0}], modificationType=DELETE, childModification={}], oldMeta = null, newMeta = null} ], ChildNode{mod = NodeModification [identifier=(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)ipv4-route[ {(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)prefix=1.2.121.198/32, (urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)path-id=0}], modificationType=DELETE, childModification={}], oldMeta = null, newMeta = null} ], ChildNode{mod = NodeModification [identifier=(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)ipv4-route[ {(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)prefix=1.2.121.197/32, (urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)path-id=0}], modificationType=DELETE, childModification={}], oldMeta = null, newMeta = null} ], ChildNode{mod = NodeModification [identifier=(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)ipv4-route[ {(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)prefix=1.2.120.199/32, (urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)path-id=0}], modificationType=DELETE, childModification={}], oldMeta = null, newMeta = null} ], ChildNode{mod = NodeModification [identifier=(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)ipv4-route[ {(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)prefix=1.2.121.199/32, (urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)path-id=0}], modificationType=DELETE, childModification={}], oldMeta = null, newMeta = null} ], ChildNode{mod = NodeModification [identifier=(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)ipv4-route[ {(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)prefix=1.2.120.197/32, (urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)path-id=0}], modificationType=DELETE, childModification={}], oldMeta = null, newMeta = null} |
| Comment by Milos Fabian [ 31/Aug/16 ] |
|
The problem addressed to controller/md-sal devs: https://lists.opendaylight.org/pipermail/controller-dev/2016-August/012561.html |
| Comment by Ajay L [ 01/Sep/16 ] |
|
Same issue is seen with Beryllium as well. Please refer exception encountered in read-life network environment under high load: 2016-09-01 20:22:04,696 | WARN | lt-dispatcher-42 | AbstractTopologyBuilder | 329 - org.opendaylight.bgpcep.bgp-topology-provider - 0.5.0.Beryllium | Data change org.opendaylight.controller.md.sal.binding.impl.LazyDataTreeModification@963f998 (transaction DOM-CHAIN-10-159563) was not completely propagated to listener org.opendaylight.bgpcep.bgp.topology.provider.LinkstateTopologyBuilder@2976ef9 |
| Comment by Robert Varga [ 26/Oct/16 ] |
|
Can you report the entire DTCN, from root for this case? |
| Comment by Robert Varga [ 07/Nov/16 ] |
|
This looks like some badness in the DataTree implementation. Specifically it looks like AbstractModifiedNodeBasedCandidateNode is allocating a ChildNode with no backing metadata. |
| Comment by Robert Varga [ 07/Nov/16 ] |
|
Now, that child node was resolved to DELETE (in SchemaAwareApplyOperation.apply()), which indicates that the 'before' metadata was supplied to it, while it was not actually present in the tree. This evidence points it squarely at StructuralContainerModificationStrategy#57, which is the only place playing this kind of a trick. |
| Comment by Robert Varga [ 07/Nov/16 ] |
|
We need logs replicating this issue, with https://git.opendaylight.org/gerrit/48049 applied to the controller and ShardDataTree logging set to TRACE. We need to see both the modification and the data tree candidate which trigger this behavior. |
| Comment by Claudio David Gasparini [ 16/Nov/16 ] |
|
Attachment karaf.log.tar.gz has been added with description: log |
| Comment by Claudio David Gasparini [ 16/Nov/16 ] |
|
Attachment karaf.log.1.tar.gz has been added with description: log1 |
| Comment by Claudio David Gasparini [ 16/Nov/16 ] |
|
Attachment karaf.log.2.tar.gz has been added with description: log2 |
| Comment by Claudio David Gasparini [ 16/Nov/16 ] |
|
Attachment karaf.log.3.tar.gz has been added with description: log3 |
| Comment by Claudio David Gasparini [ 16/Nov/16 ] |
|
Attachment karaf.log.4.tar.gz has been added with description: log4 |
| Comment by Robert Varga [ 16/Nov/16 ] |
|
Analyzing karaf.log.4, the problem is narrowed down a bit. In this specific case, the route being deleted is: prefix=1.2.120.199/32 path-id=0 The deletion record is correct, as the route is present in parent's oldMeta and it is not present in its newMeta. The stack trace has a Iterators$8.transform() as the caller of AbstractModifiedNodeBasedCandidateNode.childNode(), which points to its getChildNodes() method, which is logical, as the parent is marked as SUBTREE_MODIFIED and BGP code is traversing it to find out what is going on. Based on this, the cause seems to be the fact that eventhough the record is in oldMeta's children map, the call AbstractModifiedNodeBasedCandidateNode.childMeta() -> MaterializedContainerNode.getChild() fails to find it – but it is registering it in its children map. Since the objects in question are supposed to be immutable at this point, I think it is safe to say that the DataTree code itself is correct and the problem lies in one of three components:
|
| Comment by Robert Varga [ 16/Nov/16 ] |
|
Actually, there is a fourth component involved: ImmutableOffsetMap, which is the backing map inside NodeIdentifierWithPredicates. |
| Comment by Robert Varga [ 16/Nov/16 ] |
|
Ajay, can you check what was the JVM version involved in that, please? |
| Comment by Robert Varga [ 16/Nov/16 ] |
|
debug patch to eliminate custom maps from the picture: https://git.opendaylight.org/gerrit/48420 |
| Comment by Claudio David Gasparini [ 18/Nov/16 ] |
|
tested against u112 and ImmutableMaps, issue still present |
| Comment by Claudio David Gasparini [ 18/Nov/16 ] |
|
Attachment BUG_6577_STEPS.txt has been added with description: Steps to reproduce bug |
| Comment by Robert Varga [ 18/Nov/16 ] |
|
After additional testing it seems the problem is either in JVM or TrieMap. Specifially, I have added code, which when it encounters this conditions walks through the map's entrySet() and tries to find the corresponding key. What I am getting is: 2016-11-18 18:00:17,821 | WARN | lt-dispatcher-36 | ctModifiedNodeBasedCandidateNode | 85 - org.opendaylight.yangtools.yang-data-impl - 1.0.2.SNAPSHOT | Missing key is (urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)ipv4-route[ {(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)path-id=0, (urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)prefix=1.2.121.197/32}], identity 1745067963, hash 1674416874 ], identity 1269043004, hash 1674416874 ], value=[ImmutableLeafNode{nodeIdentifier=(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)path-id, value=0, attributes={}}, ImmutableLeafNode{nodeIdentifier=(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)prefix, value=1.2.121.197/32, attributes={}}, ImmutableContainerNode{nodeIdentifier=(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)attributes, value=[ImmutableContainerNode{nodeIdentifier=(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)as-path, value=[], attributes={}}, ImmutableContainerNode{nodeIdentifier=(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)origin, value=[ImmutableLeafNode{nodeIdentifier=(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)value, value=igp, attributes={}}], attributes={}}, ImmutableContainerNode{nodeIdentifier=(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)local-pref, value=[ImmutableLeafNode{nodeIdentifier=(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)pref, value=100, attributes={}}], attributes={}}, ImmutableChoiceNode{nodeIdentifier=(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)c-next-hop, value=[ImmutableContainerNode{nodeIdentifier=(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)ipv4-next-hop, value=[ImmutableLeafNode{nodeIdentifier=(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)global, value=127.0.0.1, attributes={}}], attributes={}}]}, ImmutableContainerNode{nodeIdentifier=(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)multi-exit-disc, value=[ImmutableLeafNode{nodeIdentifier=(urn:opendaylight:params:xml:ns:yang:bgp-inet?revision=2015-03-05)med, value=0, attributes={}}], attributes={}}], attributes={}}], attributes={}}} So the key is present as a different object in the map, they compare as .equals() in both directions and report the same hash code. When we perform the lookup using the stored key, the map returns a value. When we perform the lookup using our key, the map returns null – which is a violation of java.util.Map contract. Either the key's identity hash or identity comparison is used somewhere. Looking through TrieMap implementation, I do not see identity hash being used, nor keys being compared in any other way than through .equals(). Next step will be to retry with JDK 1.8.0u60, to see if this problem is present. |
| Comment by Robert Varga [ 18/Nov/16 ] |
|
Tracked down to https://github.com/romix/java-concurrent-hash-trie-map/blob/master/src/main/java/com/romix/scala/collection/concurrent/TrieMap.java#L464, which looks like a bug. The same code is present in the Scala version of the code at https://github.com/scala/scala/blob/2.12.x/src/library/scala/collection/concurrent/TrieMap.scala#L264. I am not sure why identity comparison is used herre specifically, and the codepath only ever triggers for read-only trie – hence I guess it is not as well-tested as the read-write case. |
| Comment by Robert Varga [ 19/Nov/16 ] |
|
upstream: https://github.com/romix/java-concurrent-hash-trie-map/pull/24 |
| Comment by Robert Varga [ 19/Nov/16 ] |
| Comment by Robert Varga [ 21/Nov/16 ] |
|
Waiting for either an upstream release or helpdesk finish ticket #33071. |
| Comment by Robert Varga [ 22/Nov/16 ] |
|
ODL-private package seeded. |
| Comment by Robert Varga [ 23/Nov/16 ] |