[OPNFLWPLUG-616] Longevity test rarely goes beyond 6 hours Created: 08/Feb/16  Updated: 27/Sep/21  Resolved: 21/Nov/16

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Luis Gomez Assignee: Luis Gomez
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Blocks
is blocked by MDSAL-197 Switch connection bounce results in w... Resolved
External issue ID: 5271

 Description   

Latest results indicate the switch count is not correct (0!=200).

https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-longevity-lithium-redesign-only-beryllium/buildTimeTrend



 Comments   
Comment by Luis Gomez [ 08/Feb/16 ]

After analyzing the test results in more detail, the issue seems to be the topology. It does not show any link after a few runs of bringing 200 switches up and down in a linear topology.

Comment by Luis Gomez [ 13/Sep/16 ]

It seems most of the time longevity test fails is because of this ERROR:

2016-09-12 07:00:16,514 | ERROR | pool-32-thread-1 | SalRoleServiceImpl | 183 - org.opendaylight.openflowplugin.impl - 0.3.0.SNAPSHOT | SetRoleService set Role BECOMESLAVE for Node: KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:33]]]]}

fail . Reason java.util.concurrent.CancellationException: Task was cancelled.
2016-09-12 07:00:17,233 | ERROR | pool-32-thread-1 | SalRoleServiceImpl | 183 - org.opendaylight.openflowplugin.impl - 0.3.0.SNAPSHOT | SetRoleService set Role BECOMESLAVE for Node: KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:201]]]]}

fail . Reason java.util.concurrent.CancellationException: Task was cancelled.
2016-09-12 07:00:17,235 | ERROR | pool-32-thread-1 | ExecutionList | 41 - com.google.guava - 18.0.0 | RuntimeException while executing runnable com.google.common.util.concurrent.Futures$6@4f23900c with executor INSTANCE
java.lang.NullPointerException
at org.opendaylight.openflowplugin.impl.device.DeviceContextImpl.shutdownConnection(DeviceContextImpl.java:568)
at org.opendaylight.openflowplugin.impl.lifecycle.LifecycleServiceImpl.closeConnection(LifecycleServiceImpl.java:138)
at org.opendaylight.openflowplugin.impl.role.RoleManagerImpl$1.onFailure(RoleManagerImpl.java:86)
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1313)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
at com.google.common.util.concurrent.AbstractFuture.cancel(AbstractFuture.java:134)
at org.opendaylight.openflowplugin.impl.role.RoleContextImpl.lambda$sendRoleChangeToDevice$0(RoleContextImpl.java:177)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581)[115:io.netty.common:4.0.37.Final]
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:655)[115:io.netty.common:4.0.37.Final]
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367)[115:io.netty.common:4.0.37.Final]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_101]

This seems to be similar or same ERROR I see in scalability test:

https://bugs.opendaylight.org/show_bug.cgi?id=6665

Comment by Luis Gomez [ 13/Sep/16 ]

I am rising priority of this to Critical to be fixed for Boron SR1.

Comment by Tomas Slusny [ 13/Sep/16 ]

Actually, it is exactly same ERROR message as in https://bugs.opendaylight.org/show_bug.cgi?id=6672. I think I will have patch ready for 6672 today, what will fix cause of that ERROR message, and so it should hopefully fix also this issue.

Comment by Luis Gomez [ 13/Sep/16 ]

Cool

Comment by Tomas Slusny [ 13/Sep/16 ]

This patch should fix this: https://git.opendaylight.org/gerrit/#/c/45526/

I posted it already in OPNFLWPLUG-767 but it fixes it only partially, but it should fix this one (it prevents that NPE exception from occurring).

Comment by Priya Ramasubbu [ 20/Sep/16 ]

Luis,

In the recent links of,
https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-longevity-lithium-redesign-only-beryllium/buildTimeTrend

the java.lang.NullPointerException and java.util.concurrent.CancellationException does not reproduced in the karaf logs recently.

Instead i could see other exceptions as below:

2016-09-20 04:15:29,082 | ERROR | lt-dispatcher-16 | RpcBroker | 159 - org.opendaylight.controller.sal-remoterpc-connector - 1.3.4.SNAPSHOT | executeRpc for (urn:opendaylight:packet:service?revision=2013-07-09)transmit-packet failed with root cause: org.opendaylight.controller.md.sal.dom.api.DOMRpcImplementationNotAvailableException: Rpc implementation for {} was removed during processing.. For exception details, enable Debug logging.

https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-1node-periodic-longevity-lithium-redesign-only-beryllium/398/archives/karaf.log.gz

Hope we can open a new bug with mdsal, by closing the current one(5271).

Comment by Tomas Slusny [ 22/Sep/16 ]

In latest Jenkins logs I see VerifyException from Singleton, what is caused by MDSAL-197.

Comment by Tomas Slusny [ 07/Oct/16 ]

Read my last reply in OPNFLWPLUG-767

Comment by Luis Gomez [ 21/Nov/16 ]

This is fixed now. Latest run on Boron/Carbon last targeted 23 hours.

Generated at Wed Feb 07 20:32:56 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.