[NETCONF-93] "Failed to initialize transaction" when removing a Netconf device Created: 28/Oct/15  Updated: 15/Mar/19  Resolved: 19/Jan/16

Status: Resolved
Project: netconf
Component/s: netconf
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Jozef Behran Assignee: Tomas Cere
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File dev124.log    
External issue ID: 4547

 Description   

When an attempt to unmount a netconf device is made through restconf, sometimes the response is:

"error-tag":"operation-failed"
"error-message":"Error while deleting data"
"error-info":"java.lang.RuntimeException: RemoteDevice

{controller-config}

: Failed to prepare candidate. Failed to initialize transaction ..."

This bug can be hit pretty reliably (over 90% hit rate) by mounting 200 devices (or more), querying each of them for data in a rapid succession (possibly using a multithreaded process) and then unmount them. The bug hits between devices number 100 and 140. An unfinished test suite doing this is available here: https://git.opendaylight.org/gerrit/#/c/28730.

Attached is a log file fragment from the time when one of these failures was observed. It is fairly large but the first exception seen there suggests that the code handling the connection to netconf somehow lost track of the netconf message IDs and choked. This might be also restconf problem but it occurs in netconf and thus it is reported as netconf bug.



 Comments   
Comment by Jozef Behran [ 28/Oct/15 ]

Attachment dev124.log has been added with description: Log file fragment with errors deleting a netconf device

Comment by Jozef Behran [ 06/Nov/15 ]

Occurs in Lithium-snapshot.

Comment by Tomas Cere [ 24/Nov/15 ]

Probably also present on Berylium, but needs to be verified there aswell.

Comment by Vratko Polak [ 26/Nov/15 ]

Beryllium status is (hopefully) tracked by this daily CSIT job (target 600 devices):
https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-1node-periodic-scale-only-beryllium/

There are also stable-lithium and *-all jobs in place.

Comment by Jozef Behran [ 30/Nov/15 ]

Target is 500 devices.

600 devices won't fit into memory due to NETCONF-88. More specifically, it fits now but once the multi-threaded GET suite gets merged as well, it won't fit anymore as the NETCONF-88 will be hit twice, leaking enough memory to hit OutOfMemoryException.

Comment by Tomas Cere [ 08/Jan/16 ]

https://git.opendaylight.org/gerrit/#/c/32272/

This patch fixes message order mixup, there is still an issue when configuring netconf connector asynchronously. Since a configuration consists of lock-edit-commit-unlock, if mulitple configurations are being sent asynchronously through one session the operations can interfere and 2 commits can be called after each other for example.

Generated at Wed Feb 07 20:14:09 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.