Details
-
Bug
-
Status: Resolved
-
Resolution: Won't Do
-
Helium
-
None
-
None
-
Operating System: All
Platform: All
-
2516
-
Normal
Description
The clustering integration tests have been failing for a while now. The "010 Credential Authentication" AAA test always fails - I assume this is an issue with the test setup.
Of more concern are the sporadic failures, specifically "Inventory Scalability OF10". An example can be seen at
https://jenkins.opendaylight.org/integration/job/integration-master-csit-cluster-min/198/robot/report/log.html#s1-s6-s1-t4.
The "Get Stats for a node" test case looks for "flow-capable-node-connector-statistics" in the REST output. This query is repeated for 2 minutes waiting for it to exceed.
In looking at the karaf.log, it appears the following OptimisticLockFailedException corresponds to the test time out failure:
2014-12-12 15:30:56,993 | WARN | lt-dispatcher-30 | InMemoryDOMDataStore | 145 - org.opendaylight.controller.sal-inmemory-datastore - 1.2.0.SNAPSHOT | Store Tx: member-1-shard-inventory-operational-441 Conflicting modification for /(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[
{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:2}].
2014-12-12 15:30:56,994 | WARN | lt-dispatcher-70 | ConcurrentDOMDataBroker | 139 - org.opendaylight.controller.sal-broker-impl - 1.2.0.SNAPSHOT | Tx: DOM-CHAIN-0-217 Error during phase CAN_COMMIT, starting Abort
java.util.concurrent.ExecutionException: OptimisticLockFailedException
at com.google.common.util.concurrent.Futures$ImmediateFailedFuture.get(Futures.java:183)[52:com.google.guava:14.0.1]
at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.doCanCommit(ShardCommitCoordinator.java:138)[259:org.opendaylight.controller.sal-distributed-datastore:1.2.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.handleCanCommit(ShardCommitCoordinator.java:128)[259:org.opendaylight.controller.sal-distributed-datastore:1.2.0.SNAPSHOT]
at org.opendaylight.controller.cluster.datastore.Shard.handleCanCommitTransaction(Shard.java:389)[259:org.opendaylight.controller.sal-distributed-datastore:1.2.0.SNAPSHOT]
...
2014-12-12 15:30:57,003 | WARN | ds-oper-thread-0 | StatisticsManagerImpl | 153 - org.opendaylight.controller.md.statistics-manager - 1.2.0.SNAPSHOT | Unhandled exception during processing statistics. Restarting transaction chain.
...
2014-12-12 15:30:57,003 | WARN | CommitFutures-2 | StatisticsManagerImpl | 153 - org.opendaylight.controller.md.statistics-manager - 1.2.0.SNAPSHOT | Failed to export Flow Capable Statistics, Transaction DOM-CHAIN-0-217 failed.
OptimisticLockFailedException
Note the "Node was deleted by other transaction." error message. This appears to indicate some code (StatisticsManager?) is supposed to put stats under an OF node (id=openflow:2) but some parent node doesn't exist (probably the "openflow:2" Node itself), either because:
1) it was deleted
2) it hadn't been inserted yet
3) a previous insert was attempted but failed
It doesn't appear to be #3 because I don't see any previous commit failures in the log.
There are also sporadic failures in "Compatible.AD SAL NSF OF10" - https://jenkins.opendaylight.org/integration/job/integration-master-csit-cluster-min/205/robot/report/log.html - where "FlowProgrammer.Check flow in flow stats" and "StatisticsManager.get port stats" fail.
These same tests have not been failing without clustering.