Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1070

Clustering: Robot integration tests failing

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Resolution: Won't Do
    • Helium
    • None
    • mdsal
    • None
    • Operating System: All
      Platform: All

    • 2516
    • Normal

    Description

      The clustering integration tests have been failing for a while now. The "010 Credential Authentication" AAA test always fails - I assume this is an issue with the test setup.

      Of more concern are the sporadic failures, specifically "Inventory Scalability OF10". An example can be seen at
      https://jenkins.opendaylight.org/integration/job/integration-master-csit-cluster-min/198/robot/report/log.html#s1-s6-s1-t4.

      The "Get Stats for a node" test case looks for "flow-capable-node-connector-statistics" in the REST output. This query is repeated for 2 minutes waiting for it to exceed.

      In looking at the karaf.log, it appears the following OptimisticLockFailedException corresponds to the test time out failure:

      2014-12-12 15:30:56,993 | WARN | lt-dispatcher-30 | InMemoryDOMDataStore | 145 - org.opendaylight.controller.sal-inmemory-datastore - 1.2.0.SNAPSHOT | Store Tx: member-1-shard-inventory-operational-441 Conflicting modification for /(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[

      {(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:2}

      ].
      2014-12-12 15:30:56,994 | WARN | lt-dispatcher-70 | ConcurrentDOMDataBroker | 139 - org.opendaylight.controller.sal-broker-impl - 1.2.0.SNAPSHOT | Tx: DOM-CHAIN-0-217 Error during phase CAN_COMMIT, starting Abort
      java.util.concurrent.ExecutionException: OptimisticLockFailedException

      {message=Optimistic lock failed., errorList=[RpcError [message=Optimistic lock failed., severity=ERROR, errorType=APPLICATION, tag=resource-denied, applicationTag=null, info=null, cause=org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node was deleted by other transaction.]]}

      at com.google.common.util.concurrent.Futures$ImmediateFailedFuture.get(Futures.java:183)[52:com.google.guava:14.0.1]
      at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.doCanCommit(ShardCommitCoordinator.java:138)[259:org.opendaylight.controller.sal-distributed-datastore:1.2.0.SNAPSHOT]
      at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleCanCommit(ShardCommitCoordinator.java:128)[259:org.opendaylight.controller.sal-distributed-datastore:1.2.0.SNAPSHOT]
      at org.opendaylight.controller.cluster.datastore.Shard.handleCanCommitTransaction(Shard.java:389)[259:org.opendaylight.controller.sal-distributed-datastore:1.2.0.SNAPSHOT]

      ...

      2014-12-12 15:30:57,003 | WARN | ds-oper-thread-0 | StatisticsManagerImpl | 153 - org.opendaylight.controller.md.statistics-manager - 1.2.0.SNAPSHOT | Unhandled exception during processing statistics. Restarting transaction chain.
      ...
      2014-12-12 15:30:57,003 | WARN | CommitFutures-2 | StatisticsManagerImpl | 153 - org.opendaylight.controller.md.statistics-manager - 1.2.0.SNAPSHOT | Failed to export Flow Capable Statistics, Transaction DOM-CHAIN-0-217 failed.
      OptimisticLockFailedException

      {message=Optimistic lock failed., errorList=[RpcError [message=Optimistic lock failed., severity=ERROR, errorType=APPLICATION, tag=resource-denied, applicationTag=null, info=null, cause=org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node was deleted by other transaction.]]}

      Note the "Node was deleted by other transaction." error message. This appears to indicate some code (StatisticsManager?) is supposed to put stats under an OF node (id=openflow:2) but some parent node doesn't exist (probably the "openflow:2" Node itself), either because:
      1) it was deleted
      2) it hadn't been inserted yet
      3) a previous insert was attempted but failed

      It doesn't appear to be #3 because I don't see any previous commit failures in the log.

      There are also sporadic failures in "Compatible.AD SAL NSF OF10" - https://jenkins.opendaylight.org/integration/job/integration-master-csit-cluster-min/205/robot/report/log.html - where "FlowProgrammer.Check flow in flow stats" and "StatisticsManager.get port stats" fail.

      These same tests have not been failing without clustering.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            tpantelis Tom Pantelis
            tpantelis Tom Pantelis
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: