[NETCONF-705] Netconf lock/unlock issue when multiple request arrives for the same device Created: 26/Jun/20  Updated: 02/Jul/21  Resolved: 02/Jul/21

Status: Resolved
Project: netconf
Component/s: netconf
Affects Version/s: None
Fix Version/s: 1.13.1

Type: Bug Priority: High
Reporter: Sarguna Dharani Assignee: Sarguna Dharani
Resolution: Done Votes: 0
Labels: pt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Tested scenario 1 with confd device.

Attached request details and confd logs for reference.


Attachments: Microsoft Word Lock_Unlock_issue_Simultaneous_call.docx    
Issue Links:
Blocks
blocks NETCONF-746 YANG patch request on NETCONF mountpo... Resolved

 Description   

when multiple requests for the same device arrives:

 

Scenario 1: (Tested with confd)

When device supports both the capability  :candidate  and :writable-running :

 

Even though lock has been denied for the lock transaction , Netconf still continues to execute get-config, edit-config and commit transactions 

Later it checks results of all above transactions , if any of the transactions fails then "discard-changes" will be executed.

But commit transaction is already done and therefore all the changes will be updated to running configuration datastore

 

Scenario 2 :

when device supports only :writable-running :

As per code analysis, It will continue with get-config, edit-config and unlock transactions to running configuration datastore even though lock transaction has been denied.

 

Response from Restconf :

Restconf returns error response but netconf internally executed all the rpc transactions to the device

<errors xmlns="urn:ietf:params:xml:ns:yang:ietf-restconf">

<error>

<error-type>application</error-type>

<error-tag>operation-failed</error-tag>

<error-message>Commit of transaction org.opendaylight.netconf.sal.connect.netconf.sal.tx.WriteCandidateRunningTx@3c57aac1 failed</error-message>

<error-info>RemoteDevice{confd-netconf-device}:RPC during tx failed. the configuration database is locked by session 16 admin ssh (netconf from XXX.XX.XX.XX ) on since 2020-06-26 13:47:19<session-id>16</session-id>the configuration database is locked by session 16 admin ssh (netconf from XXX.XX.XX.XX ) on since 2020-06-26 13:47:19<session-id>16</session-id></error-info>

</error>

</errors>

 

 



 Comments   
Comment by Jamo Luhrsen [ 26/Jun/20 ]

Sarguna_Dharani, do you have the body you are using to mount this device?

Also, I saw you assigned this to yourself. Do you have plans to fix?

Let me know how I can help.

Comment by Sarguna Dharani [ 26/Jun/20 ]

Hi jluhrsen,

Thanks for the reply. Yes, I have committed the fix for it.

And I have used below payload for mounting confd device,

<node xmlns="urn:TBD:params:xml:ns:yang:network-topology">
<node-id>confd-netconf-device</node-id>
<host xmlns="urn:opendaylight:netconf-node-topology">XXX.XX.XX.XX</host>
<port xmlns="urn:opendaylight:netconf-node-topology">2022</port>
<username xmlns="urn:opendaylight:netconf-node-topology">admin</username>
<password xmlns="urn:opendaylight:netconf-node-topology">root</password>
<tcp-only xmlns="urn:opendaylight:netconf-node-topology">false</tcp-only>
</node>

Comment by Jamo Luhrsen [ 01/Jul/20 ]

Hi Sarguna_Dharani, I also commented in the gerrit but wondering if you had the bandwidth to add unit test coverage for these cases?

Comment by Jamo Luhrsen [ 03/Jul/20 ]

Sarguna, I pulled your code locally and built it to test. I have a Junos device that is only :candidate and I still see the
problem I think is reported here.

two curl commands back to back to delete an existing interface (small 3s sleep between to make reading the logs easier).
I see the first request grab the lock, get-config, edit-config, commit, unlock... Somewhere before the unlock the 2nd
request tries to get the lock and it fails, but things just continue the same with that request... get-config, edit-config,
commit, unlock.

Am I following this correctly? Should the 2nd request get the lock denied and just quit at that point responding to the
calling client about the error? Is that what you see with your confd environment?

Comment by Sarguna Dharani [ 03/Jul/20 ]

Hi Jamo,

Can you please check whether you get lock-denied for the 2nd request from the device

When I tested this case, testtool doesn't give rpc-reply as lock-denied for the 2nd request

You can check this by executing 2 lock request simultaneously

 

http://localhost:8181/restconf/operations/network-topology:network-topology/topology/topology-netconf/node/<device-name>/yang-ext:mount/ietf-netconf:lock

If device accepts the second lock , then definitely it will continues with other transaction.

If device denies the second lock , it should immediately stop and respond error to calling client.

 

If lock has been denied in your testing for second request , can you please include netconf and device logs here  

 

Comment by Sarguna Dharani [ 03/Jul/20 ]

Hi Jamo,

I found the issue. The thing is DELETE works in different way , it uses "exists" functionality of MDSAL to trigger get-config transactions.

But PUT calls uses "read" functionality of MDSAL to trigger get-config transactions.

Only in read calls , there is check for lock transactions

I am adding the fix for it.

 

Comment by Jamo Luhrsen [ 06/Jul/20 ]

great! thanks Sarguna

I'll work to get these scenarios covered in CSIT as well. Not sure how easy
the UT would be, but maybe CSIT will be enough to help catch regressions.

Comment by Jamo Luhrsen [ 07/Jul/20 ]

cmanojwill take a look at doing this CSIT

Comment by Jamo Luhrsen [ 23/Jul/20 ]

Hi Sarguna, wondering if you have any update on this?

Comment by Sarguna Dharani [ 03/Aug/20 ]

Hi Jamo, 

Included changes for both Delete calls and Unit Testing. 

Comment by Jamo Luhrsen [ 03/Aug/20 ]

Thank you Sarguna_Dharani I've tested locally with the distribution built from your patch and
things seem to work as expected.

Comment by Robert Varga [ 28/Oct/20 ]

So reading through this a bit more, the RESTCONF seems to be okay – we can improve it by adding a dedicated case. The problem is what sal-netconf-connector does, which is that it plows on even in case of rejection. I think the patch is going to be more involved.

Comment by Vladyslav Marchenko [ 09/Dec/20 ]

Hi jluhrsen
Is there any CSIT already covers these scenarios?
Have prepared changes for it. And I am wondering how I can properly test it

Comment by Jamo Luhrsen [ 09/Dec/20 ]

no, I don't think any CSIT was written for this yet unfortunately. I had intended to come up with something eventually but
I no longer have the cycles for it.

Comment by Oleksii Mozghovyi [ 09/Mar/21 ]

Hi Sarguna and jluhrsen,

I've changed the design slightly; it should only execute operations from the transaction if the lock operation is completed successfully. I noticed a couple of other issues during testing, like an unexpected lock/unlock state that happened because of parallel transactions, so I had to fix that as well.

 

The patch is available here - https://git.opendaylight.org/gerrit/c/netconf/+/95359; for some reason, JIRA associates my patch with a different issue

Generated at Wed Feb 07 20:15:41 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.