[CONTROLLER-1583] sal-remoterpc-connector: install remote death watch Created: 17/Jan/17 Updated: 25/Jul/23 Resolved: 02/Feb/17 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Robert Varga | Assignee: | Robert Varga |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Issue Links: |
|
||||||||
| External issue ID: | 7573 | ||||||||
| Description |
|
Gossiper exchanges data with its peers, caching the buckets. When a node leaves the cluster, it now (after BUG-3128) remotes buckets for disconnected nodes. It does not handle the case of a hung or terminated peer Gossiper, in which case it will leave (and propagate) stale Buckets. To handle this case Gossiper needs to install a remote death watch, http://doc.akka.io/docs/akka/2.4/scala/remoting.html#Watching_Remote_Actors, to get notified when the advertising actor dies. When such an event occurs, it needs to invalidate the corresponding remote bucket and fire a corresponding message to RpcRegistry. That way remote RPCs will be correctly unregistered and any requests to those RPCs will fail-fast instead of timing out. |
| Comments |
| Comment by Robert Varga [ 17/Jan/17 ] |
|
As it turns out for DeathWatch we need an ActorRef, which points towards BucketStore/RpcRegistry for actual monitoring. As it turns out RoutingTable (i.e. RpcRegistry-level logic) contains an ActorRef which is useful to monitor. Hence the BucketStore should perform monitoring based on information provided via a common interface (BucketData extends Copier). When DeathWatch triggers, normal bucket removal operation can be done. |
| Comment by Robert Varga [ 18/Jan/17 ] |
| Comment by Robert Varga [ 02/Feb/17 ] |