[NETCONF-418] Failed to process scaled data response from the netconf enabled device Created: 10/May/17  Updated: 06/Aug/19  Resolved: 06/Aug/19

Status: Resolved
Project: netconf
Component/s: netconf
Affects Version/s: None
Fix Version/s: Fluorine SR1

Type: Bug
Reporter: BABU K T Assignee: Jeff Hartley
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: Text File Bug_8413.txt    
External issue ID: 8413

 Description   

Version: Boron

Controller is not able to process scaled data .
It got stuck for more than 10 minutes and not processing any other RESTCONF queries in this duration.

Xml output is about 145000 lines

POST /restconf/operations/network-topology:network-topology/topology/topology-netconf/node/bag2r/yang-ext:mount/show-route:get-route-information HTTP/1.1
Host: 10.209.1.8:8181
Accept: application/xml
Authorization: Basic
Content-Type: application/xml

Body:
<input>
</input>

Karaf Log:

017-05-10 14:12:12,658 | WARN | oupCloseable-3-7 | NetconfXMLToMessageDecoder | 98 - org.opendaylight.netconf.netty-util - 1.1.1.Boron-SR1 | XML message with unwanted leading bytes detected. Discarded the 1 leading byte(s): '0a'
2017-05-10 14:14:04,466 | WARN | ssing-executor-8 | KeepaliveSalFacade | 263 - org.opendaylight.netconf.sal-netconf-connector - 1.4.1.Boron-SR1 | RemoteDevice

{bag2r}: Rpc failure detected. Reconnecting netconf session
java.util.concurrent.CancellationException: Task was cancelled.
at com.google.common.util.concurrent.AbstractFuture.cancellationExceptionWithCause(AbstractFuture.java:392)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1308)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.AbstractFuture.cancel(AbstractFuture.java:134)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.Futures$ChainingListenableFuture.cancel(Futures.java:871)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.ForwardingFuture.cancel(ForwardingFuture.java:48)[38:com.google.guava:18.0.0]
at org.opendaylight.netconf.sal.connect.netconf.sal.KeepaliveSalFacade$RequestTimeoutTask.run(KeepaliveSalFacade.java:254)[263:org.opendaylight.netconf.sal-netconf-connector:1.4.1.Boron-SR1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)[:1.8.0_121]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)[:1.8.0_121]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)[:1.8.0_121]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)[:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)[:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)[:1.8.0_121]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_121]
Caused by: java.util.concurrent.CancellationException: Future.cancel() was called.
at com.google.common.util.concurrent.AbstractFuture$Sync.complete(AbstractFuture.java:378)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.AbstractFuture$Sync.cancel(AbstractFuture.java:355)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.AbstractFuture.cancel(AbstractFuture.java:131)[38:com.google.guava:18.0.0]
... 10 more
2017-05-10 14:14:04,468 | ERROR | ssing-executor-8 | ExecutionList | 38 - com.google.guava - 18.0.0 | RuntimeException while executing runnable com.google.common.util.concurrent.Futures$6@5523530d with executor INSTANCE
java.lang.IllegalStateException: RemoteDevice{bag2r}

: Unable to reconnect, session listener is missing
at com.google.common.base.Preconditions.checkState(Preconditions.java:197)[38:com.google.guava:18.0.0]
at org.opendaylight.netconf.sal.connect.netconf.sal.KeepaliveSalFacade.reconnect(KeepaliveSalFacade.java:117)[263:org.opendaylight.netconf.sal-netconf-connector:1.4.1.Boron-SR1]
at org.opendaylight.netconf.sal.connect.netconf.sal.KeepaliveSalFacade.access$700(KeepaliveSalFacade.java:47)[263:org.opendaylight.netconf.sal-netconf-connector:1.4.1.Boron-SR1]
at org.opendaylight.netconf.sal.connect.netconf.sal.KeepaliveSalFacade$ResetKeepalive.onFailure(KeepaliveSalFacade.java:234)[263:org.opendaylight.netconf.sal-netconf-connector:1.4.1.Boron-SR1]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1313)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.AbstractFuture.cancel(AbstractFuture.java:134)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.Futures$ChainingListenableFuture.cancel(Futures.java:871)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.ForwardingFuture.cancel(ForwardingFuture.java:48)[38:com.google.guava:18.0.0]
at org.opendaylight.netconf.sal.connect.netconf.sal.KeepaliveSalFacade$RequestTimeoutTask.run(KeepaliveSalFacade.java:254)[263:org.opendaylight.netconf.sal-netconf-connector:1.4.1.Boron-SR1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)[:1.8.0_121]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)[:1.8.0_121]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)[:1.8.0_121]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)[:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)[:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)[:1.8.0_121]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_121]
2017-05-10 14:14:12,671 | WARN | oupCloseable-3-7 | NetconfXMLToMessageDecoder | 98 - org.opendaylight.netconf.netty-util - 1.1.1.Boron-SR1 | XML message with unwanted leading bytes detected. Discarded the 1 leading byte(s): '0a'
2017-05-10 14:16:12,682 | WARN | oupCloseable-3-7 | NetconfXMLToMessageDecoder | 98 - org.opendaylight.netconf.netty-util - 1.1.1.Boron-SR1 | XML message with unwanted leading bytes detected. Discarded the 1 leading byte(s): '0a'
2017-05-10 14:18:12,711 | WARN | oupCloseable-3-7 | NetconfXMLToMessageDecoder | 98 - org.opendaylight.netconf.netty-util - 1.1.1.Boron-SR1 | XML message with unwanted leading bytes detected. Discarded the 1 leading byte(s): '0a'
2017-05-10 14:18:34,274 | WARN | ssionScavenger-3 | teInvalidatingHashSessionManager | 209 - org.ops4j.pax.web.pax-web-jetty - 3.2.9 | Timing out for 1 session(s) with id 1xowhlhw1fa981o0a60d277q04
2017-05-10 14:20:12,722 | WARN | oupCloseable-3-7 | NetconfXMLToMessageDecoder | 98 - org.opendaylight.netconf.netty-util - 1.1.1.Boron-SR1 | XML message with unwanted leading bytes detected. Discarded the 1 leading byte(s): '0a'
2017-05-10 14:22:12,752 | WARN | oupCloseable-3-7 | NetconfXMLToMessageDecoder | 98 - org.opendaylight.netconf.netty-util - 1.1.1.Boron-SR1 | XML message with unwanted leading bytes detected. Discarded the 1 leading byte(s): '0a'
2017-05-10 14:23:27,707 | WARN | ssing-executor-4 | KeepaliveSalFacade | 263 - org.opendaylight.netconf.sal-netconf-connector - 1.4.1.Boron-SR1 | RemoteDevice

{bag2r}: Rpc failure detected. Reconnecting netconf session
java.util.concurrent.CancellationException: Task was cancelled.
at com.google.common.util.concurrent.AbstractFuture.cancellationExceptionWithCause(AbstractFuture.java:392)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1308)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.AbstractFuture.cancel(AbstractFuture.java:134)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.Futures$ChainingListenableFuture.cancel(Futures.java:871)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.ForwardingFuture.cancel(ForwardingFuture.java:48)[38:com.google.guava:18.0.0]
at org.opendaylight.netconf.sal.connect.netconf.sal.KeepaliveSalFacade$RequestTimeoutTask.run(KeepaliveSalFacade.java:254)[263:org.opendaylight.netconf.sal-netconf-connector:1.4.1.Boron-SR1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)[:1.8.0_121]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)[:1.8.0_121]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)[:1.8.0_121]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)[:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)[:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)[:1.8.0_121]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_121]
Caused by: java.util.concurrent.CancellationException: Future.cancel() was called.
at com.google.common.util.concurrent.AbstractFuture$Sync.complete(AbstractFuture.java:378)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.AbstractFuture$Sync.cancel(AbstractFuture.java:355)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.AbstractFuture.cancel(AbstractFuture.java:131)[38:com.google.guava:18.0.0]
... 10 more
2017-05-10 14:23:27,708 | ERROR | ssing-executor-4 | ExecutionList | 38 - com.google.guava - 18.0.0 | RuntimeException while executing runnable com.google.common.util.concurrent.Futures$6@19b1fcfc with executor INSTANCE
java.lang.IllegalStateException: RemoteDevice{bag2r}

: Unable to reconnect, session listener is missing
at com.google.common.base.Preconditions.checkState(Preconditions.java:197)[38:com.google.guava:18.0.0]
at org.opendaylight.netconf.sal.connect.netconf.sal.KeepaliveSalFacade.reconnect(KeepaliveSalFacade.java:117)[263:org.opendaylight.netconf.sal-netconf-connector:1.4.1.Boron-SR1]
at org.opendaylight.netconf.sal.connect.netconf.sal.KeepaliveSalFacade.access$700(KeepaliveSalFacade.java:47)[263:org.opendaylight.netconf.sal-netconf-connector:1.4.1.Boron-SR1]
at org.opendaylight.netconf.sal.connect.netconf.sal.KeepaliveSalFacade$ResetKeepalive.onFailure(KeepaliveSalFacade.java:234)[263:org.opendaylight.netconf.sal-netconf-connector:1.4.1.Boron-SR1]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1313)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.AbstractFuture.cancel(AbstractFuture.java:134)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.Futures$ChainingListenableFuture.cancel(Futures.java:871)[38:com.google.guava:18.0.0]
at com.google.common.util.concurrent.ForwardingFuture.cancel(ForwardingFuture.java:48)[38:com.google.guava:18.0.0]
at org.opendaylight.netconf.sal.connect.netconf.sal.KeepaliveSalFacade$RequestTimeoutTask.run(KeepaliveSalFacade.java:254)[263:org.opendaylight.netconf.sal-netconf-connector:1.4.1.Boron-SR1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)[:1.8.0_121]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)[:1.8.0_121]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)[:1.8.0_121]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)[:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)[:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)[:1.8.0_121]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_121]



 Comments   
Comment by BABU K T [ 10/May/17 ]

XML Response for the POST query mentioned in Description

<errors xmlns="urn:ietf:params:xml:ns:yang:ietf-restconf">
<error>
<error-type>rpc</error-type>
<error-tag>partial-operation</error-tag>
<error-message>The operation was cancelled while executing.</error-message>
</error>
</errors>

Comment by Ryan Goulding [ 13/Jun/17 ]

Is keepalive enabled? This is doing a get-config to keep the ssh session alive... so if it is big you may want to disable keepalive (set to 0).

Comment by Colin Dixon [ 13/Jun/17 ]

Robert Varga also asked if we have data on how long the response took to come back--rather than just it's size. It would also be good to know if it all came back as one slug of data or trickled in.

Comment by BABU K T [ 19/Jun/17 ]

(In reply to Ryan Goulding from comment #2)
> Is keepalive enabled? This is doing a get-config to keep the ssh session
> alive... so if it is big you may want to disable keepalive (set to 0).

yes. Keep Alive Enabled.

<keepalive-delay xmlns="urn:opendaylight:netconf-node-topology">120</keepalive-delay>

And response was sent immediately for post request.

Comment by BABU K T [ 19/Jun/17 ]

(In reply to Colin Dixon from comment #3)
> Robert Varga also asked if we have data on how long the response took to
> come back--rather than just it's size. It would also be good to know if it
> all came back as one slug of data or trickled in.

Netconf device sent response immediately in one shot

Comment by Colin Dixon [ 28/Jun/17 ]

(In reply to BABU K T from comment #4)
> (In reply to Ryan Goulding from comment #2)
> > Is keepalive enabled? This is doing a get-config to keep the ssh session
> > alive... so if it is big you may want to disable keepalive (set to 0).
>
> yes. Keep Alive Enabled.
>
> <keepalive-delay
> xmlns="urn:opendaylight:netconf-node-topology">120</keepalive-delay>
>
> And response was sent immediately for post request.

Have you tried setting keepalive to 0 to see if that fixes it?

Comment by Tomas Cere [ 29/Jun/17 ]

The fact that odl got suck makes me think it started getting low on memory, I would try increasing the heap and see if you can get it to swallow such a large message.

Comment by BABU K T [ 30/Jun/17 ]

(In reply to Tomas Cere from comment #7)
> The fact that odl got suck makes me think it started getting low on memory,
> I would try increasing the heap and see if you can get it to swallow such a
> large message.

Can I do configure something on controller to increase size ? or it should be fixed via code ?

Comment by BABU K T [ 02/Jul/17 ]

(In reply to Colin Dixon from comment #6)
> (In reply to BABU K T from comment #4)
> > (In reply to Ryan Goulding from comment #2)
> > > Is keepalive enabled? This is doing a get-config to keep the ssh session
> > > alive... so if it is big you may want to disable keepalive (set to 0).
> >
> > yes. Keep Alive Enabled.
> >
> > <keepalive-delay
> > xmlns="urn:opendaylight:netconf-node-topology">120</keepalive-delay>
> >
> > And response was sent immediately for post request.
>
> Have you tried setting keepalive to 0 to see if that fixes it?

Setting keepalive to 0 not fixed the issue

Comment by BABU K T [ 02/Jul/17 ]

(In reply to BABU K T from comment #9)
> (In reply to Colin Dixon from comment #6)
> > (In reply to BABU K T from comment #4)
> > > (In reply to Ryan Goulding from comment #2)
> > > > Is keepalive enabled? This is doing a get-config to keep the ssh session
> > > > alive... so if it is big you may want to disable keepalive (set to 0).
> > >
> > > yes. Keep Alive Enabled.
> > >
> > > <keepalive-delay
> > > xmlns="urn:opendaylight:netconf-node-topology">120</keepalive-delay>
> > >
> > > And response was sent immediately for post request.
> >
> > Have you tried setting keepalive to 0 to see if that fixes it?
>
> Setting keepalive to 0 not fixed the issue

Tried again by setting keepalive to 0, For 50K lines lines of XML, controller took ~300 secs to process response. For 20 Million lines it took 60 mins.

Comment by Sai MarapaReddy [ 18/Jul/17 ]

Questions from kernel call :-
What is the new heap size ?
What are results on Carbon / Nitrogen ?
Robert / Tomas need performance snapshots from whole RAM and memory dump for further investigation

Comment by BABU K T [ 11/Aug/17 ]

Heap size data as requested in comment 11

Comment by BABU K T [ 11/Aug/17 ]

Attachment Bug_8413.txt has been added with description: Heap size data as requested in comment 11

Comment by BABU K T [ 11/Aug/17 ]

(In reply to Sai MarapaReddy from comment #11)
> Questions from kernel call :-
> What is the new heap size ?
> What are results on Carbon / Nitrogen ?
> Robert / Tomas need performance snapshots from whole RAM and memory dump for
> further investigation

Please check attachment added in comment 12 . log has been taken with carbon release

Comment by Atul Gosain [ 11/Sep/17 ]

Hi

As per file uploaded by Babu, i see this information regarding heap.
Please update if any other information is required.

[root@nms5-odl-a ~]# jstat -gcutil 1294
S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
47.70 0.00 22.80 63.72 88.66 79.37 38 2.424 8 3.476 5.899
[root@nms5-odl-a ~]# java -XX:+PrintFlagsFinal -version | grep HeapSize
uintx ErgoHeapSizeLimit = 0

{product}
uintx HeapSizePerGCThread = 87241520 {product}

uintx InitialHeapSize := 528482304

{product}
uintx LargePageHeapSizeThreshold = 134217728 {product}

uintx MaxHeapSize := 8434745344

{product}

openjdk version "1.8.0_141"
OpenJDK Runtime Environment (build 1.8.0_141-b16)
OpenJDK 64-Bit Server VM (build 25.141-b16, mixed mode)
[root@nms5-odl-a ~]# jmap -heap 1294
Attaching to process ID 1294, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.141-b16

using thread-local object allocation.
Parallel GC with 13 thread(s)

Heap Configuration:
MinHeapFreeRatio = 0
MaxHeapFreeRatio = 100
MaxHeapSize = 21474836480 (20480.0MB)
NewSize = 44564480 (42.5MB)
MaxNewSize = 7064158104 (6826.5MB)
OldSize = 89653248 (85.5MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)

Comment by JH [ 22/Sep/17 ]

Blocking issue for the netconf connector team: Long parsing jobs appear to be creating a "head of line blocking" situation here; the core/thread handling the netconf-restconf parsing/translation appears to be preventing keepalives (or other netconf transactions) from happening for the mounted device. Disabling keepalives in a production network is undesirable.

Babu has some additional findings on Carbon-SR1:
Not Fixed. Still VM is taking much time to process data
120K Lines : 143s , 190s, 217s in different instances
128K lines XML : 474s
Response is seen only when keepalive is "0". Seeing error "read was cancelled" when keep alive is other than "0" after default timeout 60s. Mentioned karaf error in next column

Here's the error message when a keepalive fails due to translation taking a long time to complete:

<errors xmlns="urn:ietf:params:xml:ns:yang:ietf-restconf">
<error>
<error-type>application</error-type>
<error-tag>operation-failed</error-tag>
<error-message>read was cancelled.</error-message>
<error-info>java.util.concurrent.CancellationException: Task was cancelled.
at com.google.common.util.concurrent.AbstractFuture.cancellationExceptionWithCause(AbstractFuture.java:392)
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63)
at org.opendaylight.yangtools.util.concurrent.MappingCheckedFuture.get(MappingCheckedFuture.java:70)
at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1308)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
at com.google.common.util.concurrent.AbstractFuture.cancel(AbstractFuture.java:134)
at com.google.common.util.concurrent.Futures$ChainingListenableFuture.cancel(Futures.java:871)
at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:898)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
at com.google.common.util.concurrent.AbstractFuture.cancel(AbstractFuture.java:134)
at com.google.common.util.concurrent.Futures$ChainingListenableFuture.cancel(Futures.java:871)
at com.google.common.util.concurrent.ForwardingFuture.cancel(ForwardingFuture.java:48)
at org.opendaylight.netconf.sal.connect.netconf.sal.KeepaliveSalFacade$RequestTimeoutTask.run(KeepaliveSalFacade.java:254)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CancellationException: Future.cancel() was called.
at com.google.common.util.concurrent.AbstractFuture$Sync.complete(AbstractFuture.java:378)
at com.google.common.util.concurrent.AbstractFuture$Sync.cancel(AbstractFuture.java:355)
at com.google.common.util.concurrent.AbstractFuture.cancel(AbstractFuture.java:131)
... 16 more
</error-info>
</error>
</errors>

Comment by Tomas Cere [ 04/Oct/17 ]

Let me clarify, we need a performance snapshot from a profiler(for example yourkit) of the whole test run, and a memory snapshot from the moment that odl seems to get stuck. But if this will show that theres no memory leak and just odl running out of memory due to parsing everything in memory you will have to increase the heap size.

To increase the heap size you need to edit the karaf setenv script in karaf/bin to something larger than 2gbs.

Comment by Jeff Hartley [ 26/Oct/17 ]

Latest testing with Carbon SR-1, JAVA_MAX_MEM=64G... We're at the point that adding more cores and RAM does not improve performance. The earlier discussion helped. The "head of line blocking" is understood to be "as-designed" now, based on previous threads/discussions read.

We've done some performance tests comparing a standalone Python Netconf client (PyEz) to the same network element (running JunOS 17.4), against the same RPC (rpc:get_interface_queue_information – essentially show all queue info).

PyEz netconf + OpenSSH_7.4p1 w/default settings, 202K lines of xml in ~30sec.
Carbon SR-1 netconf + Apache Mina = ~6 minutes.

Perhaps the best solution for these "very large payload" operational RPCs is to work on tuning the Apache Mina SSH configuration for better performance? Can this be improved via tuning buffer sizes, TCP settings, etc., or does porting a newer version of Mina solve performance issues?

Thanks

Comment by Robert Varga [ 28/Aug/18 ]

malloc_failed how does Oxygen SR3 look like? Also, can you provide representative data input, so we can provide CPU/mem usage during the request?

Comment by Jeff Hartley [ 06/Aug/19 ]

The performance issues have been resolved by a combination of activities since Carbon:
--The original JunOS yang was completely refactored and is much more performant vs yangtools
--Yangtools itself has received multiple enhancements
--Modern controller performance tuning and scaling best practices (along with Java 1.8 enhancements) have changed the environment significantly for the better.

Generated at Wed Feb 07 20:14:58 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.