[CONTROLLER-1740] Producer un-registration on isolated node is never propagated to the rest of the cluster Created: 12/Jul/17  Updated: 25/Jul/23  Resolved: 13/Jul/21

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Vratko Polak Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Relates
relates to MDSAL-397 Check path contention when create tre... Resolved
External issue ID: 8845

 Description   

This is continuation of CONTROLLER-1731 (from comment 3 on) with better title and less old information.

The test scenario: Each member starts transaction producer. Leader is isolated for a time slightly longer than 120 seconds. New leader is elected, tests sees the isolated produred report RequestTimeoutException, at that point the preoducer unregisters, while the member is still isolated.
After rejoin, the test attempts to start new producer. During producer registration, frontend rejects it, as the new leader still believes the old producer is registered.

The complete karaf.log is here [10], the final relevant segment:
2017-07-11 11:58:31,266 | DEBUG | qtp138207402-77 | ShardedDOMDataTree | 147 - org.opendaylight.mdsal.dom-broker - 2.2.1.SNAPSHOT | Creating producer org.opendaylight.mdsal.dom.broker.ShardedDOMDataTreeProducer@e86db
2017-07-11 11:58:31,266 | TRACE | qtp138207402-77 | ShardedDOMDataTree | 147 - org.opendaylight.mdsal.dom-broker - 2.2.1.SNAPSHOT | Adding subtree DOMDataTreeIdentifier{datastore=CONFIGURATION, root=/(tag:opendaylight.org
,2017:controller:yang:lowlevel:target?revision=2017-02-15)id-ints/id-int/id-int[

{(tag:opendaylight.org,2017:controller:yang:lowlevel:target?revision=2017-02-15)id=prefix-3}

]} to map.
2017-07-11 11:58:31,291 | ERROR | qtp138207402-77 | ContainerResponse | 68 - com.sun.jersey.jersey-server - 1.17.0 | The RuntimeException could not be mapped to a response, re-throwing to the HTTP container
org.opendaylight.controller.cluster.datastore.exceptions.TimeoutException: Sending message class org.opendaylight.controller.cluster.sharding.messages.ProducerCreated to actor Actorakka://opendaylight-cluster-data/user/ShardedDOMDataTreeFrontend#725503709 failed. Try again later.
at org.opendaylight.controller.cluster.datastore.utils.ActorContext.executeOperation(ActorContext.java:318)
at org.opendaylight.controller.cluster.sharding.DistributedShardedDOMDataTree.createProducer(DistributedShardedDOMDataTree.java:305)
at Proxy547e830a_606c_4629_8e3c_527344f37367.createProducer(Unknown Source)
at Proxya432b2a8_a907_437a_a7a2_58deacd6d56e.createProducer(Unknown Source)
at org.opendaylight.controller.clustering.it.provider.impl.ProduceTransactionsHandler.start(ProduceTransactionsHandler.java:67)
...

[10] https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-li-only-carbon/4/odl3_karaf.log.gz



 Comments   
Comment by Robert Varga [ 12/Jul/17 ]

This boils down to producer registrations being disseminated via ShardDataTreeActor messages, which do not work during isolation.

I think the proper fix is to make cluster-wide producers replicated via a runtime shard analogue to sharding table shard.

Comment by Vratko Polak [ 12/Jul/17 ]

If producer registration is to be reworked, I think we also need automatic unregistration on timeout, to prevent unreachable nodes from owning subtrees indefinitely.

Also, when new producer is being registered, every possible subtree overlap should be detected. That means the free-standing producer should see no ancestor producer, but also no descendant producer.

This includes inter-member race conditions, for example member-1 trying to register for /a while member-2 tries to register for /a/b.

Comment by Robert Varga [ 13/Jul/21 ]

The entire implementation was removed in CONTROLLER-1977

Generated at Wed Feb 07 19:56:20 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.