[CONTROLLER-1856] CSIT test Local_Leader_Shutdown fails in ask mode Created: 02/Aug/18  Updated: 06/Aug/18  Resolved: 06/Aug/18

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: Fluorine, Neon

Type: Bug Priority: Medium
Reporter: Josh Hershberg Assignee: Tom Pantelis
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File log.html.gz     File odl1.log     File odl2.log     File odl3.log    
Issue Links:
Relates
relates to CONTROLLER-1786 Jolokia lookup says leader exists but... Resolved
relates to OVSDB-434 br-int not created after failing over... Resolved
relates to NETVIRT-1315 Troubleshooting Controller CSIT In Progress
Epic Link: Clustering Stability

 Description   

The u/s CSIT generally runs this test in tell mode. As an experiment we ran this (and other tests) in ask mode and it failed.

What I see is:

  1. odl1 is the initial leader for the default-config shard
  2. the test removes the shard replica from odl1
  3. odl3 becomes the default-config shard leader
  4. odl2 receives a LeaderStateChanged event to the effect of (3)
  5. odl1 does NOT receive the LeaderStateChanged event (??)
  6. transactions that were in the pipeline on odl1 all fail and this message is omitted in the logs:

2018-08-01T12:18:19,503 | WARN | opendaylight-cluster-data-akka.actor.default-dispatcher-37 | LocalThreePhaseCommitCohort | 288 - org.opendaylight.controller.sal-distributed-datastore - 1.7.3.SNAPSHOT | Failed to prepare transaction member-1-datastore-config-fe-0-chn-10-txn-509-0 on backend
org.opendaylight.controller.cluster.datastore.exceptions.NoShardLeaderException: Could not process ready local transaction member-1-datastore-config-fe-0-chn-10-txn-509-0. Shard member-1-shard-default-config currently has no leader. Try again later.



 Comments   
Comment by Josh Hershberg [ 02/Aug/18 ]

Attached please find:

log.html.gz of the sandbox CSIT run

odl[1-3].log - the karaf logs of that specific test.

Comment by Tom Pantelis [ 02/Aug/18 ]

For "5. odl1 does NOT receive the LeaderStateChanged event", I would expect that since the shard was removed in step 2. For transactions in flight, with ask-based, it is expected that they may fail. This is primarily what tell-based is designed to alleviate. I assume the test passes with tell-based which I assume is why it only runs with tell-based.

So I don't think there's any bug here, except for the known limitation with ask-based - unless I'm missing something....

 

 

Comment by Josh Hershberg [ 06/Aug/18 ]

Yes, I thought this was the case but wanted to make sure. Thanks.

Comment by Josh Hershberg [ 06/Aug/18 ]

See the comment from Tom P.

Generated at Wed Feb 07 19:56:37 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.