Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1723

RequestTimeoutException after ~250s after removing module shard replica

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Resolution: Done
    • None
    • None
    • clustering
    • None
    • Operating System: All
      Platform: All

    • 8704

    Description

      This appeared in a test case which was previously failing on CONTROLLER-1707 but there is not enough runs to estimate how frequent this new bug is.

      The scenario uses module-based shards (tell-based protocol) with one transaction writer running on each member and DTCL present on the shard leader node. (If the listener is on the follower node, shard replica removal has not lead to failures, only CONTROLLER-1711 is detected by the listener).

      The writer response [0] is:
      {"errors":{"error":[{"error-type":"application","error-tag":"operation-failed","error-message":"Unexpected-exception","error-info":"TransactionCommitFailedException

      {message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.cluster.access.client.RequestTimeoutException: Timed out after 253.973720644seconds]]}

      \n\tat org.opendaylight.controller.md.sal.dom.broker.impl.TransactionCommitFailedExceptionMapper.newWithCause(TransactionCommitFailedExceptionMapper.java:37)\n\tat org.opendaylight.controller.md.sal.dom.broker.impl.TransactionCommitFailedExceptionMapper.newWithCause(TransactionCommitFailedExceptionMapper.java:18)\n\tat org.opendaylight.yangtools.util.concurrent.ExceptionMapper.apply(ExceptionMapper.java:102)\n\tat org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker.handleException(ConcurrentDOMDataBroker.java:222)\n\tat org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker.access$000(ConcurrentDOMDataBroker.java:46)\n\tat org.opendaylight.controller.cluster.databroker.ConcurrentDOMDataBroker$1.onFailure(ConcurrentDOMDataBroker.java:126)\n\tat com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)\n\tat com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)\n\tat com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)\n\tat com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)\n\tat com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)\n\tat com.google.common.util.concurrent.SettableFuture.setException(SettableFuture.java:68)\n\tat org.opendaylight.controller.cluster.databroker.actors.dds.AbstractProxyTransaction.lambda$directCommit$4(AbstractProxyTransaction.java:473)\n\tat org.opendaylight.controller.cluster.databroker.actors.dds.RemoteProxyTransaction.lambda$handleReplayedRemoteRequest$12(RemoteProxyTransaction.java:442)\n\tat org.opendaylight.controller.cluster.access.client.ConnectionEntry.complete(ConnectionEntry.java:50)\n\tat org.opendaylight.controller.cluster.access.client.AbstractClientConnection.lockedCheckTimeout(AbstractClientConnection.java:372)\n\tat org.opendaylight.controller.cluster.access.client.AbstractClientConnection.runTimer(AbstractClientConnection.java:303)\n\tat org.opendaylight.controller.cluster.access.client.ClientActorBehavior.onReceiveCommand(ClientActorBehavior.java:121)\n\tat org.opendaylight.controller.cluster.access.client.ClientActorBehavior.onReceiveCommand(ClientActorBehavior.java:45)\n\tat org.opendaylight.controller.cluster.access.client.AbstractClientActor.onReceiveCommand(AbstractClientActor.java:59)\n\tat akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.scala:170)\n\tat akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)\n\tat akka.actor.Actor$class.aroundReceive(Actor.scala:502)\n\tat akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundReceive(PersistentActor.scala:168)\n\tat akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.scala:727)\n\tat akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.scala:183)\n\tat akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.scala:168)\n\tat akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)\n\tat akka.actor.ActorCell.invoke(ActorCell.scala:495)\n\tat akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)\n\tat akka.dispatch.Mailbox.run(Mailbox.scala:224)\n\tat akka.dispatch.Mailbox.exec(Mailbox.scala:234)\n\tat scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)\n\tat scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)\n\tat scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)\n\tat scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)\nCaused by: org.opendaylight.controller.cluster.access.client.RequestTimeoutException: Timed out after 253.973720644seconds\n\t... 21 more\n"}]}}

      This Bug shares similarities with multiple known bugs:

      The writer response is almost identical to CONTROLLER-1707 but that was timing out after 120s (as expected for unsuccessful client reconnect), and is supposedly fixed. Also, CONTROLLER-1707 was not present when moving leader onto writer node.

      It is not known whether ~250 seconds is stable or fluctuating value. If it is fluctuating, this may be a less severe case of CONTROLLER-1708 even though that bug is observed on isolation scenarios.

      Also, if RequestTimeoutException can happen with duration significantly longer than 120s, it may explain CONTROLLER-1687.

      Update: A Sandbox run with verbose logs encountered [1] this with duration of 133 seconds.

      [0] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/750/log.html.gz#s1-s36-t1-k2-k15-k3-k1-k1
      [1] https://logs.opendaylight.org/sandbox/jenkins091/controller-csit-3node-clustering-only-carbon/35/log.html.gz#s1-s36-t1-k2-k15-k3-k1-k1

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            rovarga Robert Varga
            vrpolak Vratko Polak
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: