[CONTROLLER-1077] Clustering: Race condition may cause missing routes in RPC BucketStore Created: 19/Dec/14 Updated: 09/Jan/15 Resolved: 09/Jan/15 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | mdsal |
| Affects Version/s: | Helium |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Tom Pantelis | Assignee: | Tom Pantelis |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| External issue ID: | 2526 |
| Priority: | High |
| Description |
|
The workflow by which the RpcRegistry adds route identifiers into the BucketStore is as follows:
The problem is that the BucketStore exposes its internal localBucket, thus allowing it to be modified externally. The manner in which the RpcRegistry modifies the Bucket instance is not thread safe and introcudes a race condition. This can result in missing RouteIdentifiers as in the following scenario:
In addition, neither the 'data' member nor "version" member in Bucket are volatile so the writes may not be visible to other threads. Also setData should atomically set 'data' and 'version' for consistency. Propose solution: The BucketStore should not expose its internal state and allow it to be mutated externally. We should remove the GetLocalBucket message. Instead of getting the current routing table, modifying a local copy, then writing it back, which is problematic, we should just send an AddToLocalBucket message to the BucketStore and have it update its local Bucket. Similar with remove. |
| Comments |
| Comment by Tom Pantelis [ 19/Dec/14 ] |
|
I created a unit test that illustrates the bug. It sends N AddOrUpdateRoutes messages and then verifies the routes IDs are all in the BucketStore. The test fails sporadically with a varying # missing – of course, the higher N, the more often it fails. I even saw it fail once with only 2 (it got 1 out of 2). |
| Comment by Tom Pantelis [ 19/Dec/14 ] |
| Comment by Tom Pantelis [ 06/Jan/15 ] |
|
Manually cherry picked to stable/helium: https://git.opendaylight.org/gerrit/#/c/13938/ |