[CONTROLLER-1740] Producer un-registration on isolated node is never propagated to the rest of the cluster Created: 12/Jul/17 Updated: 25/Jul/23 Resolved: 13/Jul/21 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Vratko Polak | Assignee: | Unassigned |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Issue Links: |
|
||||||||
| External issue ID: | 8845 | ||||||||
| Description |
|
This is continuation of The test scenario: Each member starts transaction producer. Leader is isolated for a time slightly longer than 120 seconds. New leader is elected, tests sees the isolated produred report RequestTimeoutException, at that point the preoducer unregisters, while the member is still isolated. The complete karaf.log is here [10], the final relevant segment: ]} to map. |
| Comments |
| Comment by Robert Varga [ 12/Jul/17 ] |
|
This boils down to producer registrations being disseminated via ShardDataTreeActor messages, which do not work during isolation. I think the proper fix is to make cluster-wide producers replicated via a runtime shard analogue to sharding table shard. |
| Comment by Vratko Polak [ 12/Jul/17 ] |
|
If producer registration is to be reworked, I think we also need automatic unregistration on timeout, to prevent unreachable nodes from owning subtrees indefinitely. Also, when new producer is being registered, every possible subtree overlap should be detected. That means the free-standing producer should see no ancestor producer, but also no descendant producer. This includes inter-member race conditions, for example member-1 trying to register for /a while member-2 tries to register for /a/b. |
| Comment by Robert Varga [ 13/Jul/21 ] |
|
The entire implementation was removed in |