[CONTROLLER-1823] Regression in OF cluster test Created: 05/Apr/18 Updated: 12/Apr/18 Resolved: 07/Apr/18 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | None |
| Fix Version/s: | Carbon, Nitrogen, Oxygen, Fluorine |
| Type: | Bug | Priority: | High |
| Reporter: | Luis Gomez | Assignee: | Robert Varga |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
There is a regression in OF cluster test: Stats collection freezes after disconnecting switch from owner or when reconnecting switch to additional members. This does not happen immediately but after a few retries. After some investigation, I was able to trace back the regression to this patch and all the cherry-picks: https://git.opendaylight.org/gerrit/#/c/68900/ |
| Comments |
| Comment by Luis Gomez [ 05/Apr/18 ] |
|
Attached are full traces with cluster debug. Problem in this case started when switch was only connected to member-1 and it initiates connection to extra member-2 and member-3 at 01:59:32, you can see the WARNs on member-1 few seconds after: 2018-04-02 01:59:51,328 | WARN | ofppool-2 | TransactionContextWrapper | 215 - org.opendaylight.controller.sal-distributed-datastore - 1.6.3.SNAPSHOT | Failed to acquire enqueue operation permit for transaction member-1-datastore-operational-fe-0-chn-6-txn-96-0 on shard inventory 2018-04-02 01:59:56,330 | WARN | ofppool-2 | RemoteTransactionContext | 215 - org.opendaylight.controller.sal-distributed-datastore - 1.6.3.SNAPSHOT | Failed to acquire execute operation permit for transaction member-1-datastore-operational-fe-0-chn-6-txn-96-0 on actor ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@192.168.0.102:2550/), Path(/user/shardmanager-operational/member-2-shard-inventory-operational/shard-inventory-member-1:datastore-operational@0:6-96_797#643731958)] |
| Comment by Robert Varga [ 05/Apr/18 ] |
|
The problem is the stateful semaphore handoff between TransactionContextWrapper and RemoteTransactionContext. RemoteTransactionContext needs to understand when an incoming operations has tried to acquire a permit and whether it was successful in doing so. With the |
| Comment by Robert Varga [ 05/Apr/18 ] |
| Comment by Robert Varga [ 06/Apr/18 ] |
|
Oxygen: https://git.opendaylight.org/gerrit/70384 Fluorine: https://git.opendaylight.org/gerrit/70439 Carbon: https://git.opendaylight.org/gerrit/70440
|