[OPNFLWPLUG-891] Xid cannot be reserved after OVS connection flaps Created: 16/May/17  Updated: 27/Sep/21  Resolved: 28/Aug/17

Status: Resolved
Project: OpenFlowPlugin
Component/s: General
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Vinh Nguyen Assignee: Jozef Bacigal
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: File karaf.7z     File karaf.log.10.7z     File karaf.log.11.7z     File karaf.log.12.7z     File karaf.log.13.7z     File karaf.log.14.7z     File karaf.log.15.7z     File karaf.log.16.7z     File karaf.log.17.7z     File karaf.log.18.7z     File karaf.log.7z     File karaf.log.tar    
External issue ID: 8478

 Description   

When defining large number of networks/subnets in bulk, OVS connection flaps frequently and eventually remains disconnected.

Environment: single openstack node and single ODL.
ODL version: carbon -
distribution-karaf-0.6.0-20170516.135813-5306.tar.gz

To reproduce:

1) using script to define large number of networks/subnets:

neutron net-create vx-net$j-$i --provider:network_type vxlan
neutron subnet-create vx-net$j-$i 10.$j.$i.0/24 -name vx-subnet$j$i --dns-nameserver 8.8.8.8

where j is [1..15] and j is [1..100]

2) Around 1000 networks, OVS connection flaps and eventually remains disconneted.



 Comments   
Comment by Vinh Nguyen [ 16/May/17 ]

Attachment karaf.log.tar has been added with description: karar.log.tar

Comment by Jozef Bacigal [ 05/Jun/17 ]

Hi Vinh,

can you please run this again on latest snapshots with bigger log size and DEBUG log for org.opendayligh.openflowplugin.impl

Thanks

Jozef

Comment by Vinh Nguyen [ 10/Jun/17 ]

Hi Jozef, I am not able to reproduce this problem with the latest snapshot. However since this issue happened intermittently when I logged it, I'll try some more runs to make sure it is gone in the latest code base

Comment by Vinh Nguyen [ 14/Jun/17 ]

Attachment karaf.log.18.7z has been added with description: karaf.log.18

Comment by Vinh Nguyen [ 14/Jun/17 ]

Attachment karaf.log.17.7z has been added with description: karaf.log.17

Comment by Vinh Nguyen [ 14/Jun/17 ]

Attachment karaf.log.16.7z has been added with description: karaf.log.16

Comment by Vinh Nguyen [ 14/Jun/17 ]

Attachment karaf.log.15.7z has been added with description: karaf.log.15

Comment by Vinh Nguyen [ 14/Jun/17 ]

Attachment karaf.log.14.7z has been added with description: karaf.log.14

Comment by Vinh Nguyen [ 14/Jun/17 ]

Attachment karaf.log.13.7z has been added with description: karaf.log.13

Comment by Vinh Nguyen [ 14/Jun/17 ]

Attachment karaf.log.12.7z has been added with description: karaf.log.12

Comment by Vinh Nguyen [ 14/Jun/17 ]

Attachment karaf.log.11.7z has been added with description: karaf.log.11

Comment by Vinh Nguyen [ 14/Jun/17 ]

Attachment karaf.log.10.7z has been added with description: karaf.log.10

Comment by Vinh Nguyen [ 14/Jun/17 ]

I cannot reproduce the original issue with latest ODL snapshot. However I encounter another issue:

When defining large number of networks/subnets in bulk, OVS connection flaps frequently and new requests for network/subnet creation results in ERROR/WARNING in log files:

"No queue present, failing request"
"Xid cannot be reserved for new RequestContext"

Environment: single openstack node and single ODL.
ODL version: distribution-karaf-0.6.1-20170609.000930-229.tar.gz

To reproduce:

1) using script to define large number of networks/subnets:

neutron net-create vx-net$j-$i --provider:network_type vxlan
neutron subnet-create vx-net$j-$i 10.$j.$i.0/24 -name vx-subnet$j$i --dns-nameserver 8.8.8.8

where j is [1..30] and j is [1..100]

2) OVS connection to OFPlugin flaps several times but ODL recovers and keeps up the with continuous network creation requests

3) Around 3000 network creations, OVS connection flaps, this time new network creation requests result in ERROR/WARNING in karaf.log:

"No queue present, failing request"
"Xid cannot be reserved for new RequestContext"

I attached the karaf log files captured before and when the issus happens.

Some interesting notes:

1) The "Xid cannot be reserved " issue happens in karaf.log.10

2) There are two consecutive ConnectionEvent for "Device connected" in the last connection flap.

Normally, the karaf log shows two events: one for disconnection and one for connection when the connection flaps:

> grep ConnectionEvent
karaf.log.18:2017-06-13 03:26:05,245 | INFO | entLoopGroup-7-4 | SystemNotificationsListenerImpl | 306 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | ConnectionEvent: Connection closed by device, Device:/10.138.0.2:49318, NodeId:openflow:158835981449481
karaf.log.18:2017-06-13 03:26:05,868 | INFO | entLoopGroup-7-5 | DeviceManagerImpl | 306 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | ConnectionEvent: Device connected to controller, Device:/10.138.0.2:53148, NodeId:Uri [_value=openflow:158835981449481]
karaf.log.15:2017-06-13 04:54:15,286 | INFO | entLoopGroup-7-5 | SystemNotificationsListenerImpl | 306 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | ConnectionEvent: Connection closed by device, Device:/10.138.0.2:53148, NodeId:openflow:158835981449481
karaf.log.15:2017-06-13 04:54:15,827 | INFO | entLoopGroup-7-6 | DeviceManagerImpl | 306 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | ConnectionEvent: Device connected to controller, Device:/10.138.0.2:60476, NodeId:Uri [_value=openflow:158835981449481]
karaf.log.10:2017-06-13 06:10:25,663 | INFO | entLoopGroup-7-8 | DeviceManagerImpl | 306 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | ConnectionEvent: Device connected to controller, Device:/10.138.0.2:36790, NodeId:Uri [_value=openflow:158835981449481]

3) In between the two "Device Connected" ConnectionEvents described above, there is an handshake attempt from the OVS. OF Plugin sees thatthis device is already connected and closing all connections:

2017-06-13 06:10:23,680 | INFO | entLoopGroup-7-7 | ConnectionAdapterImpl | 295 - org.opendaylight.openflowjava.openflow-protocol-impl - 0.9.1.SNAPSHOT | Hello received
2017-06-13 06:10:23,681 | DEBUG | entLoopGroup-7-7 | nflowProtocolListenerInitialImpl | 306 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | processing HELLO.xid: 36 from device /10.138.0.2:36788
2017-06-13 06:10:23,682 | DEBUG | entLoopGroup-7-7 | HandshakeListenerImpl | 306 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | handshake succeeded: /10.138.0.2:36788
2017-06-13 06:10:23,683 | DEBUG | entLoopGroup-7-7 | HandshakeListenerImpl | 306 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | succeeded by getting sweep barrier after post-handshake for device openflow:158835981449481
2017-06-13 06:10:23,683 | INFO | entLoopGroup-7-7 | ContextChainHolderImpl | 306 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Device openflow:158835981449481 connected.
2017-06-13 06:10:23,683 | WARN | entLoopGroup-7-7 | ContextChainHolderImpl | 306 - org.opendaylight.openflowplugin.impl - 0.4.1.SNAPSHOT | Device openflow:158835981449481 already connected. Closing all connection to the device.

Comment by Jozef Bacigal [ 26/Jun/17 ]

Recheck this please after this patch

https://git.opendaylight.org/gerrit/#/c/57814/

Comment by Vinh Nguyen [ 29/Jun/17 ]

The problem still exists after the patch:

https://git.opendaylight.org/gerrit/#/c/57814/

The "Xid cannot be reserved " problem still happens after the duplicate connection is received:

"Device openflow:158835981449481 already connected. Closing all connection to the device"

Comment by Jozef Bacigal [ 03/Jul/17 ]

OK and now the double connection problem

https://git.opendaylight.org/gerrit/#/c/59557/

Try please this patch.

Thanks

Comment by Vinh Nguyen [ 05/Jul/17 ]

Karaf.log for patch 59557

Comment by Vinh Nguyen [ 05/Jul/17 ]

Attachment karaf.log.7z has been added with description: karaf.log.patch59557

Comment by Vinh Nguyen [ 05/Jul/17 ]

The patch 59557 doesn't fix the problem. New flows are still not pushed when new connection handshake received for already-connected OVS device.

Please see the attached karaf logs (last 10 logs, the already-connected connection issue happens in karaf.log file)

Comment by Vinh Nguyen [ 05/Jul/17 ]

(In reply to Vinh Nguyen from comment #16)
> Created attachment 1896 [details]
> karaf.log.patch59557
>
> Karaf.log for patch 59557

Please ignore this attachment

Comment by Vinh Nguyen [ 05/Jul/17 ]

Attachment karaf.7z has been added with description: New karaf.log for 59557

Comment by Vinh Nguyen [ 05/Jul/17 ]

(In reply to Vinh Nguyen from comment #17)
> The patch 59557 doesn't fix the problem. New flows are still not pushed when
> new connection handshake received for already-connected OVS device.
>
> Please see the attached karaf logs (last 10 logs, the already-connected
> connection issue happens in karaf.log file)

The error messages "Xid cannot be reserved.." are not seen in the logs but the problem now seems that odl connection are dropped but new connection is not processed properly.

Comment by Jozef Bacigal [ 14/Aug/17 ]

Closing this, connection bug is 8805

Generated at Wed Feb 07 20:33:39 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.