[CONTROLLER-1336] Clustering: Persist data lost Created: 26/May/15  Updated: 18/Aug/15  Resolved: 18/Aug/15

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: Helium
Fix Version/s: None

Type: Bug
Reporter: wuzhan Assignee: Tom Pantelis
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Linux
Platform: PC


External issue ID: 3369

 Description   

I used one node cluster.
1\First, I wrote data on path:
InstanceIdentifier.builder(VirtualRouter.class)
.child(Vr.class, new VrKey(vrId)).toInstance();
I used put method. Restconf result is
1. <virtual-router xmlns="urn:opendaylight:vipran:virtual-router">
2. <vr>
3. <vr-id>${vrname}</vr-id>
4. <vr-node-a>
5. <vr-node-id>openflow:3</vr-node-id>
<ip>10.10.30.10</ip>
6. </vr-node-a>
7. <vr-node-a>
8. <vr-node-id>openflow:4</vr-node-id>
<ip>10.10.40.10</ip>
9. </vr-node-a>
10. <topology-id>${vrname}</topology-id>
11. <vr-node-b>
12. <vr-node-id>openflow:1</vr-node-id>
<ip>10.10.10.10</ip>
13. </vr-node-b>
14. <vr-node-b>
15. <vr-node-id>openflow:2</vr-node-id>
<ip>10.10.20.10</ip>
16. </vr-node-b>
17. </vr>
18. </virtual-router>

2\Then, I wrote data on path:
InstanceIdentifier.builder(VirtualRouter.class)
.child(Vr.class, new VrKey(uniIf.getVrId()))
.child(VrNodeA.class, new VrNodeAKey(uniIf.getUniNode()))
.child(VrNodeConnector.class, new VrNodeConnectorKey(uniIf.getUniPortName()))
.child(NodeUniIf.class, new NodeUniIfKey(uniIf.getUniName())).toInstance();
I used merge method. Restconf result is
1. <virtual-router xmlns="urn:opendaylight:vipran:virtual-router">
2. <vr>
3. <vr-id>${vrname}</vr-id>
4. <topology-id>${vrname}</topology-id>
5. <vr-node-a>
6. <vr-node-id>openflow:3</vr-node-id>
7. <vr-node-connector>
8. <node-connector-port-name>s3-eth1</node-connector-port-name>
9. <node-uni-if>
10. <uni-name>uni1</uni-name>
11. <uni-node>openflow:3</uni-node>
12. <vlan>1</vlan>
13. <uni-description-info>vr1+openflow:3+s3-eth1+vlan1</uni-description-info>
14. <ip-mask xmlns="urn:opendaylight:vipran:uni-bind">24</ip-mask>
15. <vrf xmlns="urn:opendaylight:vipran:uni-bind">vrf1</vrf>
16. <ip xmlns="urn:opendaylight:vipran:uni-bind">192.168.0.1</ip>
17. <bridge-type xmlns="urn:opendaylight:vipran:uni-bind">1toN</bridge-type>
18. <uni-port-name>s3-eth1</uni-port-name>
19. <vr-id>${vrname}</vr-id>
20. </node-uni-if>
21. </vr-node-connector>
22. <ip>10.10.30.10</ip>
23. </vr-node-a>
24. <vr-node-a>
25. <vr-node-id>openflow:4</vr-node-id>
26. <vr-node-connector>
27. <node-connector-port-name>s4-eth1</node-connector-port-name>
28. <node-uni-if>
29. <uni-name>uni2</uni-name>
30. <uni-node>openflow:4</uni-node>
31. <vlan>2</vlan>
32. <uni-description-info>vr1+openflow:4+s4-eth1+vlan2</uni-description-info>
33. <uni-port-name>s4-eth1</uni-port-name>
34. <vr-id>${vrname}</vr-id>
35. </node-uni-if>
36. </vr-node-connector>
37. <ip>10.10.40.10</ip>
38. </vr-node-a>
39. <vr-node-b>
40. <vr-node-id>openflow:1</vr-node-id>
41. <ip>10.10.10.10</ip>
42. </vr-node-b>
43. <vr-node-b>
44. <vr-node-id>openflow:2</vr-node-id>
45. <ip>10.10.20.10</ip>
46. </vr-node-b>
47. </vr>
48. </virtual-router>
3\ I reboot ODL,vr-node-b data lost, vr-node-a remain.
And I confirm that before the restart I were able to see both vr-a and vr-b in the data.

1. <virtual-router xmlns="urn:opendaylight:vipran:virtual-router">
2. <vr>
3. <vr-id>${vrname}</vr-id>
4. <vr-node-a>
5. <vr-node-id>openflow:3</vr-node-id>
6. <vr-node-connector>
7. <node-connector-port-name>s3-eth1</node-connector-port-name>
8. <node-uni-if>
9. <uni-name>uni1</uni-name>
10. <uni-node>openflow:3</uni-node>
11. <vlan>1</vlan>
12. <uni-description-info>vr1+openflow:3+s3-eth1+vlan1</uni-description-info>
13. <vrf xmlns="urn:opendaylight:vipran:uni-bind">vrf1</vrf>
14. <ip-mask xmlns="urn:opendaylight:vipran:uni-bind">24</ip-mask>
15. <bridge-type xmlns="urn:opendaylight:vipran:uni-bind">1toN</bridge-type>
16. <ip xmlns="urn:opendaylight:vipran:uni-bind">192.168.0.1</ip>
17. <uni-port-name>s3-eth1</uni-port-name>
18. <vr-id>${vrname}</vr-id>
19. </node-uni-if>
20. </vr-node-connector>
21. </vr-node-a>
22. <vr-node-a>
23. <vr-node-id>openflow:4</vr-node-id>
24. <vr-node-connector>
25. <node-connector-port-name>s4-eth1</node-connector-port-name>
26. <node-uni-if>
27. <uni-name>uni2</uni-name>
28. <uni-node>openflow:4</uni-node>
29. <vlan>2</vlan>
30. <uni-description-info>vr1+openflow:4+s4-eth1+vlan2</uni-description-info>
31. <uni-port-name>s4-eth1</uni-port-name>
32. <vr-id>${vrname}</vr-id>
33. </node-uni-if>
34. </vr-node-connector>
35. </vr-node-a>
36. </vr>
37. </virtual-router>

4\If I used merge method both, it will be fine. I hava test.



 Comments   
Comment by Tony Tkacik [ 26/May/15 ]

Are you able to replicate this on Lithium? Or does this affects only Helium?

Comment by wuzhan [ 26/May/15 ]

(In reply to Tony Tkacik from comment #1)
> Are you able to replicate this on Lithium? Or does this affects only Helium?

I only tested it on Helium sr3.
Where can I get stable Lithium release?

Comment by Tony Tkacik [ 26/May/15 ]

Need more datapoints: are both write in same transaction?
Different transactions?
How many writes already happened to DS?

Comment by wuzhan [ 27/May/15 ]

(In reply to Tony Tkacik from comment #3)
> Need more datapoints: are both write in same transaction?
> Different transactions?
> How many writes already happened to DS?

Different transaction.
Only this two transaction. And some other transactions in topology.

Comment by Tom Pantelis [ 03/Jun/15 ]

Can you repro and upload the karaf.log?

After doing that:

  • shutdown the controller
  • edit etc/opendaylight/karaf/05-clustering.xml. Find the <module> entry with <name> "distributed-config-store-module". Add the following under <module>

<config-properties>
<shard-journal-recovery-log-batch-size>1</shard-journal-recovery-log-batch-size>
<config-properties/>

I assume you're writing to the config data store. If not, find "distributed-operational-store-module" and add the same except with tag <operational-properties>.

  • delete the journal and snapshot directories
  • restart the controller and see if it reproduces.

Also please test this with stable/lithium to see if it repros (w/o setting shard-journal-recovery-log-batch-size to 1).

Comment by Moiz Raja [ 09/Jun/15 ]

This may not be an issue in Lithium - Tom could not reproduce. Changing milestone to Helium-4.

Comment by Tom Pantelis [ 10/Jun/15 ]

To proceed with this, we need the bug reporter to test with the shard-journal-recovery-log-batch-size setting to 1 or with the latest stable/helium which has shard-journal-recovery-log-batch-size set to 1 by default.

Also test against stable/lithium.

I tried to reproduce by creating a yang model to mostly match the XML data from the description and doing a put and a merge similarly in separate transactions but all the data was correctly recovered on restart in both lithium and helium.

Comment by Tom Pantelis [ 13/Jun/15 ]

I'll close this by the end of next week if we don't hear back from the bug reporter.

Comment by Moiz Raja [ 18/Aug/15 ]

Closing this as there was no response from the reporter.

Generated at Wed Feb 07 19:55:17 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.