[NETCONF-705] Netconf lock/unlock issue when multiple request arrives for the same device Created: 26/Jun/20 Updated: 02/Jul/21 Resolved: 02/Jul/21 |
|
| Status: | Resolved |
| Project: | netconf |
| Component/s: | netconf |
| Affects Version/s: | None |
| Fix Version/s: | 1.13.1 |
| Type: | Bug | Priority: | High |
| Reporter: | Sarguna Dharani | Assignee: | Sarguna Dharani |
| Resolution: | Done | Votes: | 0 |
| Labels: | pt | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Tested scenario 1 with confd device. Attached request details and confd logs for reference. |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
when multiple requests for the same device arrives:
Scenario 1: (Tested with confd) When device supports both the capability :candidate and :writable-running :
Even though lock has been denied for the lock transaction , Netconf still continues to execute get-config, edit-config and commit transactions Later it checks results of all above transactions , if any of the transactions fails then "discard-changes" will be executed. But commit transaction is already done and therefore all the changes will be updated to running configuration datastore
Scenario 2 : when device supports only :writable-running : As per code analysis, It will continue with get-config, edit-config and unlock transactions to running configuration datastore even though lock transaction has been denied.
Response from Restconf : Restconf returns error response but netconf internally executed all the rpc transactions to the device <errors xmlns="urn:ietf:params:xml:ns:yang:ietf-restconf"> <error> <error-type>application</error-type> <error-tag>operation-failed</error-tag> <error-message>Commit of transaction org.opendaylight.netconf.sal.connect.netconf.sal.tx.WriteCandidateRunningTx@3c57aac1 failed</error-message> <error-info>RemoteDevice{confd-netconf-device}:RPC during tx failed. the configuration database is locked by session 16 admin ssh (netconf from XXX.XX.XX.XX ) on since 2020-06-26 13:47:19<session-id>16</session-id>the configuration database is locked by session 16 admin ssh (netconf from XXX.XX.XX.XX ) on since 2020-06-26 13:47:19<session-id>16</session-id></error-info> </error> </errors>
|
| Comments |
| Comment by Jamo Luhrsen [ 26/Jun/20 ] |
|
Sarguna_Dharani, do you have the body you are using to mount this device? Also, I saw you assigned this to yourself. Do you have plans to fix? Let me know how I can help. |
| Comment by Sarguna Dharani [ 26/Jun/20 ] |
|
Hi jluhrsen, Thanks for the reply. Yes, I have committed the fix for it. And I have used below payload for mounting confd device, <node xmlns="urn:TBD:params:xml:ns:yang:network-topology"> |
| Comment by Jamo Luhrsen [ 01/Jul/20 ] |
|
Hi Sarguna_Dharani, I also commented in the gerrit but wondering if you had the bandwidth to add unit test coverage for these cases? |
| Comment by Jamo Luhrsen [ 03/Jul/20 ] |
|
Sarguna, I pulled your code locally and built it to test. I have a Junos device that is only :candidate and I still see the two curl commands back to back to delete an existing interface (small 3s sleep between to make reading the logs easier). Am I following this correctly? Should the 2nd request get the lock denied and just quit at that point responding to the |
| Comment by Sarguna Dharani [ 03/Jul/20 ] |
|
Hi Jamo, Can you please check whether you get lock-denied for the 2nd request from the device When I tested this case, testtool doesn't give rpc-reply as lock-denied for the 2nd request You can check this by executing 2 lock request simultaneously
If device accepts the second lock , then definitely it will continues with other transaction. If device denies the second lock , it should immediately stop and respond error to calling client.
If lock has been denied in your testing for second request , can you please include netconf and device logs here
|
| Comment by Sarguna Dharani [ 03/Jul/20 ] |
|
Hi Jamo, I found the issue. The thing is DELETE works in different way , it uses "exists" functionality of MDSAL to trigger get-config transactions. But PUT calls uses "read" functionality of MDSAL to trigger get-config transactions. Only in read calls , there is check for lock transactions I am adding the fix for it.
|
| Comment by Jamo Luhrsen [ 06/Jul/20 ] |
|
great! thanks Sarguna I'll work to get these scenarios covered in CSIT as well. Not sure how easy |
| Comment by Jamo Luhrsen [ 07/Jul/20 ] |
|
cmanojwill take a look at doing this CSIT |
| Comment by Jamo Luhrsen [ 23/Jul/20 ] |
|
Hi Sarguna, wondering if you have any update on this? |
| Comment by Sarguna Dharani [ 03/Aug/20 ] |
|
Hi Jamo, Included changes for both Delete calls and Unit Testing. |
| Comment by Jamo Luhrsen [ 03/Aug/20 ] |
|
Thank you Sarguna_Dharani I've tested locally with the distribution built from your patch and |
| Comment by Robert Varga [ 28/Oct/20 ] |
|
So reading through this a bit more, the RESTCONF seems to be okay – we can improve it by adding a dedicated case. The problem is what sal-netconf-connector does, which is that it plows on even in case of rejection. I think the patch is going to be more involved. |
| Comment by Vladyslav Marchenko [ 09/Dec/20 ] |
|
Hi jluhrsen |
| Comment by Jamo Luhrsen [ 09/Dec/20 ] |
|
no, I don't think any CSIT was written for this yet unfortunately. I had intended to come up with something eventually but |
| Comment by Oleksii Mozghovyi [ 09/Mar/21 ] |
|
I've changed the design slightly; it should only execute operations from the transaction if the lock operation is completed successfully. I noticed a couple of other issues during testing, like an unexpected lock/unlock state that happened because of parallel transactions, so I had to fix that as well.
The patch is available here - https://git.opendaylight.org/gerrit/c/netconf/+/95359; for some reason, JIRA associates my patch with a different issue |