[CONTROLLER-1711] Listener registration lost when local replica is removed Created: 07/Jun/17 Updated: 25/Jul/23 Resolved: 05/Oct/18 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Vratko Polak | Assignee: | Unassigned |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| External issue ID: | 8629 |
| Description |
|
This was seen on Sandbox on a code which is already merged to stable/carbon. The title says possibly, as from the two similar test cases, only one has failed [0], it was the one where the listener was on the same member as the shard leader, which has been moved by calling remove-prefix-shard-replica. In huge karaf.log [1] with some debugs look between removal starting at 19:23:10,608 and suite adding replica back at 19:23:41,773. [0] https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/6/log.html.gz#s1-s20-t1-k2-k16-k2-k1-k4-k7-k1 |
| Comments |
| Comment by Tom Pantelis [ 08/Jun/17 ] |
|
So from my understanding, a DTCL is registered for shard A on member1 then shard A is removed from member1. Later shard A is re-added to member1 and the DTCL is not notified. Is this the scenario? If so that's to be expected as the DTCL registrations belong to the shard and thus "go away" when the shard does. Kind of an edge case... |
| Comment by Vratko Polak [ 09/Jun/17 ] |
|
Now seen on RelEng [5]. > Later shard A is re-added to member1 That is only done in test teardown, so that is not part of the scenario. > DTCL is registered for shard A on member1 then shard A is removed from member1. This a DDTL [2], not DTCL. If the listener is on a follower, removal of the leader shard replica does not lead to failures [3], thus the new leader continues to send notifications without missing any (or the new leader does not commit anything, which I would expect to lead to errors in producers [4]). [2] https://github.com/opendaylight/controller/blob/5997e14efab9c12e7be2b7fb83f7efe16c2bfe7c/opendaylight/md-sal/samples/clustering-test-app/provider/src/main/java/org/opendaylight/controller/clustering/it/provider/impl/IdIntsDOMDataTreeLIstener.java#L23 |
| Comment by Vratko Polak [ 13/Jun/17 ] |
|
Sandbox testing results. Is there a reasonable way to wait until no notification is being processed? [6] https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/21/log.html.gz#s1-s4-t1-k2-k14 |
| Comment by Vratko Polak [ 23/Jun/17 ] |
|
> the new leader continues to send notifications without missing any Okay, so this is where I was wrong. The current cluster-wide listener implementation apparently relies on local replica data. No local replica, no data change notifications. Since this Bug now tracks the missing functionality, I will open another one for listener failures not related to missing local replica. |