[CONTROLLER-1016] Clustering : BGP - Linkstate topology missing. Created: 11/Nov/14 Updated: 04/Dec/14 Resolved: 04/Dec/14 |
|
| Status: | Verified |
| Project: | controller |
| Component/s: | mdsal |
| Affects Version/s: | Helium |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Vratko Polak | Assignee: | Moiz Raja |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| External issue ID: | 2353 | ||||||||||||||||||||||||
| Description |
|
When Karaf is started with clustering (3 nodes, no replication, no persistence, quiet period after installing odl-mdsal-clustering), and then odl-bgpcep-all feature is installed, there are several errors in logs (to be attached shortly). It is not clear which is the primary cause (even whether it is bug in BGP or in clustering), so reporting only symptoms: Linkstate topology is missing links and nodes, but ipv4 topology is complete. Posible direct (as opposed to primary) cause from log: 2014-11-11 17:00:54,661 | ERROR | CommitFutures-8 | RIBImpl | 259 - org.opendaylight.bgpcep.bgp-rib-impl - 0.3.2.Helium-SR1 | Failed to commit RIB modification |
| Comments |
| Comment by Vratko Polak [ 11/Nov/14 ] |
|
Attachment bug2353.tar.xz has been added with description: Archive with curl output and karaf.log from 3 instances (third one did not even had ipv4 topology). |
| Comment by Robert Varga [ 11/Nov/14 ] |
|
Looks like a clustering problem. Since neither replication nor persistence is enabled, this should be working locally. I do not think we can recover from this failure in the app – we have no way of affectivng the datastore, nor do we have visibility into how the DS heals. |
| Comment by Moiz Raja [ 11/Nov/14 ] |
|
Vratko, Could you please try this with a controller built from master. There were a couple of issues with the transaction chaining implementation in the clustered datastore that may possibly affect you. Also I will need to check the configuration to see how persistence/replication were disabled. |
| Comment by Vratko Polak [ 11/Nov/14 ] |
|
> Could you please try this with a controller built from master. Will do that tomorrow. > Also I will need to check the configuration to see how Persistence: The old way, using But snapshot and journal directories were not present, so hopefully there was not much difference in behavior from truly no-persistence setup. Replication: |
| Comment by Vratko Polak [ 11/Nov/14 ] |
|
Attachment conf.tar.xz has been added with description: Archive containing various configuration, as present when logs were gathered. |
| Comment by Vratko Polak [ 11/Nov/14 ] |
|
I forgot to mention, that the same ODL and XRVR configuration was manually tested against non-clustered configuration (basically everything the same, just odl-mdsal-clustering feature not installed) and linkstate topology was there indeed. |
| Comment by Moiz Raja [ 12/Nov/14 ] |
|
I added some notes on how to disable persistence. This is applicable to post stable/helium code only. Persistence is enabled by default in stable/helium so I wonder why snapshots and journal directory were not created for you. |
| Comment by Moiz Raja [ 12/Nov/14 ] |
|
I looked at the attached configuration and it seems to be correct. I can only suspect that you are being hit by the following bugs, 2318 All of these are resolved on master but need to be merged to stable/helium. |
| Comment by Vratko Polak [ 12/Nov/14 ] |
|
> Could you please try this with a controller built from master. What I got is the same bug as described in >> snapshot and journal directories were not present, Pardon my weak English. I meant to say that those directories were not created before starting karaf. Of course they were created during karaf run. |
| Comment by Vratko Polak [ 12/Nov/14 ] |
|
The bug is also present on master, also errors look similar. 2014-11-12 11:39:23,329 | ERROR | lt-dispatcher-17 | OneForOneStrategy | 234 - com.typesafe.akka.slf4j - 2.3.4 | Node identifier contains different value: (urn:opendaylight:params:xml:ns:yang:bgp-linkstate?revision=2013-11-25)isis-area-id[[B@7e1ac7e4] than value itself: [B@10398a6a which was also present in Helium branch logs, I just did not recognize it then. 2014-11-12 11:39:28,362 | ERROR | CommitFutures-1 | RIBImpl | 263 - org.opendaylight.bgpcep.bgp-rib-impl - 0.4.0.SNAPSHOT | Broken chain in RIB KeyedInstanceIdentifier {targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.bgp.rib.rev130925.bgp.rib.Rib, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.bgp.rib.rev130925.BgpRib, org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.bgp.rib.rev130925.bgp.rib.Rib[key=RibKey [_id=Uri [_value=example-bgp-rib-2]]]]} transaction DOM-CHAIN-3-11 |
| Comment by Vratko Polak [ 12/Nov/14 ] |
|
Attachment karaf_20141112.log.xz has been added with description: master branch compressed full log from .10 node |
| Comment by Tom Pantelis [ 12/Nov/14 ] |
|
> This error emanates from the ImmutableLeafSetEntryNodeBuilder: ImmutableLeafSetEntryNode(final YangInstanceIdentifier.NodeWithValue nodeIdentifier, final T value, final Map<QName, String> attributes) { The value is a byte [] (as evidenced by toString output "[B@...") so I can see why the values don't match as byte[]#equals only checks reference equality. The code needs to check element equality in case of array. So this appears to be a bug in ImmutableLeafSetEntryNodeBuilder but why doesn't this manifest with the in-memory data store (IMDS)? I suspect with the IMDS the 2 byte[] vars happen to be the same instance/reference. However in the CDS, the data is serialized/de-serialized so this results in different instances and equality fails. |
| Comment by Moiz Raja [ 12/Nov/14 ] |
|
Tom, this is related to the byte[] serialization defect. That was fixed however you can have a NodeIdentifier also with a byte[] as the value (leaflists) that is where this problem occurs. This we can fix in the ValueSerializer but I suspect there is more to this bug than just this. I will be investigating this further... |
| Comment by Moiz Raja [ 14/Nov/14 ] |
|
https://git.opendaylight.org/gerrit/#/c/12820/ - yangtools |
| Comment by Moiz Raja [ 15/Nov/14 ] |
|
https://git.opendaylight.org/gerrit/#/c/12827/ - contoller - stable/helium |
| Comment by Moiz Raja [ 17/Nov/14 ] |
|
Yangtools patch merged https://git.opendaylight.org/gerrit/#/c/12820/ - yangtools:master |