[NETCONF-994] akka asktimeout in a single node cluster Created: 18/Apr/23  Updated: 29/Sep/23

Status: Open
Project: netconf
Component/s: restconf-nb
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: High
Reporter: Venkatrangan Govindarajan Assignee: Venkatrangan Govindarajan
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
  1. Used NEtconf 5.0.4
  2. JAVA_OPTS=Xms512M -Xmx12288m -XX:+UseG1GC
  3. Tried mounting 40K devices one after another with payloads like this..

 8<---------------

 <node xmlns="urn:TBD:params:xml:ns:yang:network-topology">
  <node-id>$NODE_NAME</node-id>
  <host xmlns="urn:opendaylight:netconf-node-topology">172.17.0.2</host>
  <port xmlns="urn:opendaylight:netconf-node-topology">$NODE_PORT</port>
  <username xmlns="urn:opendaylight:netconf-node-topology">admin</username>
  <password xmlns="urn:opendaylight:netconf-node-topology">topsecret</password>
  <tcp-only xmlns="urn:opendaylight:netconf-node-topology">false</tcp-only>
  <reconnect-on-changed-schema xmlns="urn:opendaylight:netconf-node-topology">false</reconnect-on-changed-schema>
  <connection-timeout-millis xmlns="urn:opendaylight:netconf-node-topology">20000</connection-timeout-millis>
  <default-request-timeout-millis xmlns="urn:opendaylight:netconf-node-topology">60000</default-request-timeout-millis>
  <max-connection-attempts xmlns="urn:opendaylight:netconf-node-topology">24</max-connection-attempts>
  <between-attempts-timeout-millis xmlns="urn:opendaylight:netconf-node-topology">60000</between-attempts-timeout-millis>
  <sleep-factor xmlns="urn:opendaylight:netconf-node-topology">1</sleep-factor>
  <keepalive-delay xmlns="urn:opendaylight:netconf-node-topology">300</keepalive-delay>
</node>

----->8-----

4. After nearly 20 K PUT in a sequence with a sleep of 0.01 seconds after every PUT.

5. ODL stopped responding to PUT, GET was working though!!!

Karaf.log kept printing this error again and again

----8<---------

2023-04-17T17:48:18,974 | WARN  | ForkJoinPool.commonPool-worker-9 | AbstractShardBackendResolver     | 194 - org.opendaylight.controller.sal-distributed-datastore - 7.0.4 | Failed to resolve shard
java.util.concurrent.TimeoutException: Connection attempt failed
        at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.wrap(AbstractShardBackendResolver.java:151) ~[?:?]
        at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.onConnectResponse(AbstractShardBackendResolver.java:168) ~[?:?]
        at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.lambda$connectShard$4(AbstractShardBackendResolver.java:161) ~[?:?]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
        at java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:483) ~[?:?]
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) ~[?:?]
        at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) ~[?:?]
        at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) ~[?:?]
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) ~[?:?]
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) ~[?:?]
Caused by: akka.pattern.AskTimeoutException: Ask timed out on ActorSelection[Anchor(akka://opendaylight-cluster-data/), Path(/user/shardmanager-config/member-1-shard-topology-config#900040822)] after [5000 ms]. Message of type [org.opendaylight.controller.cluster.access.commands.ConnectClientRequest]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
2023-04-17T17:48:25,013 | WARN  | ForkJoinPool.commonPool-worker-9 | AbstractShardBackendResolver     | 194 - org.opendaylight.controller.sal-distributed-datastore - 7.0.4 | Failed to resolve shard
java.util.concurrent.TimeoutException: Connection attempt failed
        at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.wrap(AbstractShardBackendResolver.java:151) ~[?:?]
        at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.onConnectResponse(AbstractShardBackendResolver.java:168) ~[?:?]
        at org.opendaylight.controller.cluster.databroker.actors.dds.AbstractShardBackendResolver.lambda$connectShard$4(AbstractShardBackendResolver.java:161) ~[?:?]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
        at java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:483) ~[?:?]
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) ~[?:?]
        at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) ~[?:?]
        at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) ~[?:?]
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) ~[?:?]
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) ~[?:?]

----->8----------------

 

Why is this akka timeout seen in a single node deployment?, Is there any way to disable akka for single node deployments.

 

 

 



 Comments   
Comment by Robert Varga [ 18/Apr/23 ]

I suspect there is an earlier problem, so we will need a complete karaf.log.

As for selection of datastore implementation, that is currently not possible with how feature.xml files are currently generated and therefore it is always sal-distributed-datastore – and it has been this way since forever. Changing that requires, as a start, five things:

  • ODLPARENT-257, so that odl-mdsal-broker can expose a DataBroker instance and be tested with mdsal-inmemory-datastore
  • package mdsal-dom-inmemory-datastore as a standalone feature
  • implement persistence in it, lest you are willing to accept config datastore being wiped on restarts
  • refactor all features that depend on DataBroker to use a feature requirement on a datastore implementation (that means moving to template-feature-parent) and not odl-mdsal-broker (which they currently do)
  • updating installation instructions to reflect the need to select a datastore, or some other end-user visible changes to differentiate single-node and multi-node deployments
Comment by Rohini Ambika [ 05/May/23 ]

Can you share the akka.conf file used in your set up.

Generated at Wed Feb 07 23:09:06 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.