[OPNFLWPLUG-760] OVS Port remains DOWN in OpenFlow inventory when using Mininet 2.1.0 with OVS 2.0 OF1.0 Created: 31/Aug/16  Updated: 27/Sep/21  Resolved: 16/Oct/17

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Karthik Sivasamy Assignee: Karthik Sivasamy
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 6595

 Description   

The latest regression runs for VTN CSIT jobs reports failure in normal flow installation when testing with OF 1.0 switches. The same jobs are not reporting failures when testing with OF 1.3 switches.

VTN CSIT Jobs URL reporting failures
https://jenkins.opendaylight.org/releng/view/vtn/job/vtn-csit-1node-manager-only-boron/637/
https://jenkins.opendaylight.org/releng/view/vtn/job/vtn-csit-1node-coordinator-only-boron/209/

Karaf Log URL
https://logs.opendaylight.org/releng/jenkins092/vtn-csit-1node-manager-only-boron/637/archives/karaf.log.gz
https://logs.opendaylight.org/releng/jenkins092/vtn-csit-1node-coordinator-only-boron/209/archives/karaf.log.gz

Observation

During the failure scenario the below error was observed in the log

2016-08-28 19:34:22,691 | WARN | entLoopGroup-5-3 | DeviceContextImpl | 183 - org.opendaylight.openflowplugin.impl - 0.3.0.SNAPSHOT | Error processing port status message:
org.opendaylight.controller.md.sal.common.api.data.TransactionChainClosedException: Cannot write into transaction.
at org.opendaylight.openflowplugin.impl.device.TransactionChainManager.writeToTransaction(TransactionChainManager.java:233)[183:org.opendaylight.openflowplugin.impl:0.3.0.SNAPSHOT]
at org.opendaylight.openflowplugin.impl.device.DeviceContextImpl.writeToTransaction(DeviceContextImpl.java:223)[183:org.opendaylight.openflowplugin.impl:0.3.0.SNAPSHOT]
at org.opendaylight.openflowplugin.impl.device.DeviceContextImpl.processPortStatusMessage(DeviceContextImpl.java:331)[183:org.opendaylight.openflowplugin.impl:0.3.0.SNAPSHOT]
at org.opendaylight.openflowplugin.impl.device.listener.OpenflowProtocolListenerFullImpl.onPortStatusMessage(OpenflowProtocolListenerFullImpl.java:94)[183:org.opendaylight.openflowplugin.impl:0.3.0.SNAPSHOT]
at org.opendaylight.openflowjava.protocol.impl.core.connection.ConnectionAdapterImpl.consumeDeviceMessage(ConnectionAdapterImpl.java:129)[173:org.opendaylight.openflowjava.openflow-protocol-impl:0.8.0.SNAPSHOT]
at org.opendaylight.openflowjava.protocol.impl.core.connection.AbstractConnectionAdapterStatistics.consume(AbstractConnectionAdapterStatistics.java:66)[173:org.opendaylight.openflowjava.openflow-protocol-impl:0.8.0.SNAPSHOT]
at org.opendaylight.openflowjava.protocol.impl.core.connection.ConnectionAdapterImpl.consume(ConnectionAdapterImpl.java:43)[173:org.opendaylight.openflowjava.openflow-protocol-impl:0.8.0.SNAPSHOT]
at org.opendaylight.openflowjava.protocol.impl.core.DelegatingInboundHandler.channelRead(DelegatingInboundHandler.java:46)[173:org.opendaylight.openflowjava.openflow-protocol-impl:0.8.0.SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)[116:io.netty.transport:4.0.37.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)[116:io.netty.transport:4.0.37.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)[116:io.netty.transport:4.0.37.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)[116:io.netty.transport:4.0.37.Final]
at org.opendaylight.openflowjava.protocol.impl.core.connection.AbstractOutboundQueueManager.channelRead(AbstractOutboundQueueManager.java:195)[173:org.opendaylight.openflowjava.openflow-protocol-impl:0.8.0.SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)[116:io.netty.transport:4.0.37.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)[116:io.netty.transport:4.0.37.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)[116:io.netty.transport:4.0.37.Final]
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)[117:io.netty.codec:4.0.37.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)[116:io.netty.transport:4.0.37.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)[116:io.netty.transport:4.0.37.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)[116:io.netty.transport:4.0.37.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)[117:io.netty.codec:4.0.37.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)[117:io.netty.codec:4.0.37.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)[116:io.netty.transport:4.0.37.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)[116:io.netty.transport:4.0.37.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)[116:io.netty.transport:4.0.37.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)[117:io.netty.codec:4.0.37.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)[117:io.netty.codec:4.0.37.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)[116:io.netty.transport:4.0.37.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)[116:io.netty.transport:4.0.37.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)[116:io.netty.transport:4.0.37.Final]
at io.netty.handler.timeout.ReadTimeoutHandler.channelRead(ReadTimeoutHandler.java:152)[119:io.netty.handler:4.0.37.Final]
at org.opendaylight.openflowjava.protocol.impl.core.IdleHandler.channelRead(IdleHandler.java:39)[173:org.opendaylight.openflowjava.openflow-protocol-impl:0.8.0.SNAPSHOT]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)[116:io.netty.transport:4.0.37.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)[116:io.netty.transport:4.0.37.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)[116:io.netty.transport:4.0.37.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1280)[116:io.netty.transport:4.0.37.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)[116:io.netty.transport:4.0.37.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)[116:io.netty.transport:4.0.37.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:890)[116:io.netty.transport:4.0.37.Final]
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:879)[120:io.netty.transport-native-epoll:4.0.37.Final]
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:387)[120:io.netty.transport-native-epoll:4.0.37.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:296)[120:io.netty.transport-native-epoll:4.0.37.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)[115:io.netty.common:4.0.37.Final]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:145)[115:io.netty.common:4.0.37.Final]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_101]

The packets received at the ports were dropped as the status was reported as down.

2016-08-28 19:35:13,879 | WARN | Runner: VTN Main | VBridge | 194 - org.opendaylight.vtn.manager.implementation - 0.5.0.SNAPSHOT | vBridge:Tenant1/vBridge1: Drop packet because egress port is down: src=0e:d0:3e:75:86:41, dst=52:5b:30:b8:d8:0b, port=openflow:2:1, type=0x806, vlan=0

this is a blocker for VTN testing in OF 1.0 switches.



 Comments   
Comment by Shuva Jyoti Kar [ 01/Sep/16 ]

Have put in a fix to get more information on the port whose status is being missed:
https://git.opendaylight.org/gerrit/#/c/45024/

Please try it out with this and share the logs for us to analyse the failure to process the port status message.

However since :
https://logs.opendaylight.org/releng/jenkins092/vtn-csit-1node-manager-only-boron/637/archives/karaf.log.gz
shows that there are 34 flow addition failures of which
a. 19 are due to error type BADACTION code BADSETARGUMENT
b. 10 due to Device disconnected
c. 5 due to java.util.concurrent.CancellationException: Task was cancelled

Could you guys also take a look and ascertain that
1. the moment flow is being pushed the switch is in connected state
2. flows that exist in the config DS are written to the switch if the switch connects back
3. The port status is reflected correctly - as in when it is up , it's actually up in the switch(OVS )
4. The actions are supported in the switch (OVS) on which they are being pushed.

Comment by Hideyuki Tai [ 02/Sep/16 ]

I'm sharing the outcome of my investigation so far.

I think the OpenFlow plugin failed to process a port status message because the message came from a switch before it started the service as MASTER for the switch.
As a result, OpenFlow plugin failed to notify the VTN feature of a port creation, so the VTN feature didn't forward a packet as expected.
I guess that a patch, which was merged recently, has changed the way to start the service as MASTER, and introduced this issue.

    1. Detailed information

Here is detailed information of my investigation.

I've checked the following log file on a failed job.

https://logs.opendaylight.org/sandbox/jenkins091/vtn-csit-3node-manager-all-boron/6/archives/odl1_karaf.log.gz

In this job, the VTN feature wrongly judged a port "openflow:2:1" was down, because probably OFP didn't notify the VTN Manager of the port creation.

I saw the following log messages on the log file.

2016-09-01 18:16:24,099 | DEBUG | entLoopGroup-5-3 | TransactionChainManager | 183 - org.opendaylight.openflowplugin.impl - 0.3.0.SNAPSHOT | WriteTx is null for node KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:2]]]]}

. Write data for KeyedInstanceIdentifier

{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.node.NodeConnector, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey [_id=Uri [_value=openflow:2]]], org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.node.NodeConnector[key=NodeConnectorKey [_id=Uri [_value=openflow:2:1]]]]}

was not realized.
2016-09-01 18:16:24,099 | WARN | entLoopGroup-5-3 | DeviceContextImpl | 183 - org.opendaylight.openflowplugin.impl - 0.3.0.SNAPSHOT | Error processing port status message:
org.opendaylight.controller.md.sal.common.api.data.TransactionChainClosedException: Cannot write into transaction.

I think the OpenFlow plugin failed to process port status message, because it failed to get a WriteTransaction.
I think the OpenFlow plugin failed to get the WriteTransaction, because it was before it started the service as MASTER for the siwtch "openflow:2".

Actually, I saw the following log message 160 msec after the above log message.

2016-09-01 18:16:24,260 | INFO | ult-dispatcher-2 | LifecycleServiceImpl | 183 - org.opendaylight.openflowplugin.impl - 0.3.0.SNAPSHOT | ========== Starting clustering MASTER services for node openflow:2 ==========

    1. Related code

[openflowplugin-impl/src/main/java/org/opendaylight/openflowplugin/impl/device/TransactionChainManager.java]

223 <T extends DataObject> void writeToTransaction(final LogicalDatastoreType store,
224 final InstanceIdentifier<T> path,
225 final T data,
226 final boolean createParents){
227 final WriteTransaction writeTx = getTransactionSafely();
228 if (writeTx != null) {
229 LOG.trace("writeToTransaction called with path {} ", path);
230 writeTx.put(store, path, data, createParents);
231 } else {
232 LOG.debug("WriteTx is null for node {}. Write data for {} was not realized.", nodeII, path);
233 throw new TransactionChainClosedException("Cannot write into transaction.");
234 }
235 }

258 @Nullable
259 private WriteTransaction getTransactionSafely() {
260 synchronized (txLock) {
261 if (wTx == null && TransactionChainManagerStatus.WORKING.equals(transactionChainManagerStatus)) {
262 if (wTx == null && txChainFactory != null)

{ 263 wTx = txChainFactory.newWriteOnlyTransaction(); 264 }

265 }
266 }
267 return wTx;
268 }
269

Comment by Hideyuki Tai [ 02/Sep/16 ]

(In reply to Shuva Jyoti Kar from comment #1)
> However since :
> https://logs.opendaylight.org/releng/jenkins092/vtn-csit-1node-manager-only-
> boron/637/archives/karaf.log.gz
> shows that there are 34 flow addition failures of which
> a. 19 are due to error type BADACTION code BADSETARGUMENT
> b. 10 due to Device disconnected
> c. 5 due to java.util.concurrent.CancellationException: Task was cancelled

Thank you for check the log messages!

However, all those flow addition failures are not related to the CSIT test failures.
So we should ignore all those flow addition failures in this bug report.

In the VTN CSIT, firstly some tests are done with OF10 switches, and secondly some tests are done with OF13 switches.

The test failures which this bug report is handling were occurred in the test with OF10.
On the other hand, all flow addition failures which you pointed out were occurred in the test OF13, and those flow addition failures do not make the CSIT failed.

And, the following two types of error messages are expected. In the test, test scripts stop and start mininet a few times to change its topology to test many scenarios. On an event to stop mininet, the following types of error messages are output, but that is expected, and that doesn't introduce any test failures.

> b. 10 due to Device disconnected
> c. 5 due to java.util.concurrent.CancellationException: Task was cancelled

On the other hand, the following error is not good.
VTN project needs to investigate this error, although this doesn't make the result of CSIT failed, so it's not a major bug probably. I'm guessing it's a bug of the test code of the CSIT. I'm filing the new bug report to deal with this issue.

> a. 19 are due to error type BADACTION code BADSETARGUMENT

Anyway, as I explained in the previous comment in the #c2, I think the root cause of the test failure of CSIT is that the OpenFlow Plugin failed to process port status message, and failed to put correct port information into the operational DS. So I think we need to focus and fix this issue.

Comment by A H [ 02/Sep/16 ]

Is there an ETA for this bug and someone assigned to fix?

Comment by Luis Gomez [ 06/Sep/16 ]

After more testing this bug only reproduces with OVS 2.0 in OF1.0 (default) mode the first time you start mininet after controller restarts. I have added test in OF suite to detect this:

https://git.opendaylight.org/gerrit/#/c/45193/

BR/Luis

Comment by A H [ 06/Sep/16 ]

To better assess the impact of this bug and fix, could someone from your team please help us identify the following:
Severity: Could you elaborate on the severity of this bug? Is this a BLOCKER such that we cannot release Boron without it? Is there a workaround such that we can write a release note and fix in future Boron SR1?
Testing: Could you also elaborate on the testing of this patch? How extensively has this patch been tested? Is it covered by any unit tests or system tests?
Impact: Does this fix impact any dependent projects?

Comment by Luis Gomez [ 06/Sep/16 ]

I just see 1 port status failing in Beryllium:

https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-flow-services-only-beryllium/1827/

This would mean:

  • The issue may not be a regression.
  • It is probably some missbehavior in OVS 2.0 OF1.0

Therefore I downgrade the importance from Blocker to Major.

Comment by A H [ 08/Sep/16 ]

Has this bug been verified as fixed in the latest Boron RC 3.1 Build?

Comment by Luis Gomez [ 08/Sep/16 ]

Mininet 2.1.0 with OVS 2.0 OF1.0 has this strange behavior that switch s2 connects to controller with port s2-eth1 down. After some time the switch sends a port status update to controller who seems to miss to update the OpenFlow inventory. I still believe there is something odd in the controller with this port status update but since this only happens with very specific (and old) version of mininet and OVS we can definitely lower the priority of this.

Comment by Tomas Slusny [ 05/Jun/17 ]

Can someone retest this again on latest master? Patch that was solving issues with missed/early port status messages was merged and it was solving similar issues as ones mentioned in this bug.

Comment by Abhijit Kumbhare [ 16/Oct/17 ]

Closing this as there was no update.

Generated at Wed Feb 07 20:33:19 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.