[BGPCEP-431] PCEP session goes down on not sending SRP object in an update packet Created: 08/Apr/16  Updated: 03/Mar/19  Resolved: 12/May/16

Status: Resolved
Project: bgpcep
Component/s: PCEP
Affects Version/s: Bugzilla Migration
Fix Version/s: Bugzilla Migration

Type: Bug
Reporter: Ajay Chhabria Assignee: Milos Fabian
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: JPEG File PCEP_UPDATE_without_SRP.jpg     File SRP_object_included.pcap     File SRP_object_missing.pcap     Text File pcep_session_down.log    
External issue ID: 5689

 Description   

Along with the PCEP session going down, there is an unhandled PCErr message noticed in the karaf logs of the controller. Please find the snapshot of the logs seen:

2016-04-08 02:24:50,208 | WARN | oupCloseable-6-2 | tateful07TopologySessionListener | 282 - org.opendaylight.bgpcep.pcep-topology-provider - 0.4.4.Lithium-SR4 | Unhandled PCErr message PcerrMessage [_errors=[Errors [_errorObject=ErrorObject [_type=6, _value=10, _ignore=false, _processingRule=false, augmentation=[]], augmentation=[]]], augmentation=[]].
2016-04-08 02:24:50,208 | INFO | oupCloseable-6-2 | AbstractTopologySessionListener | 282 - org.opendaylight.bgpcep.pcep-topology-provider - 0.4.4.Lithium-SR4 | Unhandled message Pcerr [_pcerrMessage=PcerrMessage [_errors=[Errors [_errorObject=ErrorObject [_type=6, _value=10, _ignore=false, _processingRule=false, augmentation=[]], augmentation=[]]], augmentation=[]], augmentation=[]] on session PCEPSessionImpl

{channel=[id: 0xd566f018, /10.18.132.98:22756 => /10.18.162.90:4189], localOpen=Open [_deadTimer=120, _keepalive=30, _sessionId=0, _tlvs=Tlvs [augmentation=[Tlvs1 [_srPceCapability=SrPceCapability [_msd=0, augmentation=[]]], Tlvs1 [_stateful=Stateful [_lspUpdateCapability=true, augmentation=[Stateful1 [_initiation=true]]]]]], augmentation=[]], remoteOpen=Open [_deadTimer=30, _keepalive=5, _sessionId=1, _tlvs=Tlvs [augmentation=[Tlvs1 [], Tlvs1 [_stateful=Stateful [_lspUpdateCapability=true, augmentation=[Stateful1 [_initiation=true]]]]]], _version=ProtocolVersion [_value=1], _ignore=false, _processingRule=false, augmentation=[]]}

 Comments   
Comment by Ajay Chhabria [ 08/Apr/16 ]

Attachment SRP_object_included.pcap has been added with description: PCAP files with SRP objects in an update packet.

Comment by Ajay Chhabria [ 08/Apr/16 ]

Attachment SRP_object_missing.pcap has been added with description: PCAP file with SRP object NOT sent in an UPDATE packet

Comment by Milos Fabian [ 11/Apr/16 ]

The SRP is mandatory object in RCUpd message, hence error message is expected.
Ref.: https://tools.ietf.org/html/draft-ietf-pce-stateful-pce-07#section-7.2

The warning in logs is basically harmless, just notifying user that received error message is not binded to any request (as there is no SRP in Error message), so no action is done in the application.

Could you please provide a piece of the log when session went down? Did PCC requested the session closing? How did you managed to send update without SRP object?

Comment by Ajay Chhabria [ 11/Apr/16 ]

Hi Milos,

We changed the controller code to not send the SRP object. We wanted to generate PCEP error type 6 and value 10. Find the snapshot of the log that shows the PCEP session went down unexpectedly. I am attaching the entire karaf file as well to this bug. Let me know if you would need any further information.

2016-04-08 02:10:48,747 | WARN | oupCloseable-6-1 | AbstractTopologySessionListener | 282 - org.opendaylight.bgpcep.pcep-topology-provider - 0.4.4.Lithium-SR4 | Session PCEPSessionImpl

{channel=[id: 0xc238978e, /10.18.132.98:57951 :> /10.18.162.90:4189], localOpen=Open [_deadTimer=120, _keepalive=30, _sessionId=0, _tlvs=Tlvs [augmentation=[Tlvs1 [_stateful=Stateful [_lspUpdateCapability=true, augmentation=[Stateful1 [_initiation=true]]]], Tlvs1 [_srPceCapability=SrPceCapability [_msd=0, augmentation=[]]]]], augmentation=[]], remoteOpen=Open [_deadTimer=30, _keepalive=5, _sessionId=1, _tlvs=Tlvs [augmentation=[Tlvs1 [_stateful=Stateful [_lspUpdateCapability=true, augmentation=[Stateful1 [_initiation=true]]]], Tlvs1 []]], _version=ProtocolVersion [_value=1], _ignore=false, _processingRule=false, augmentation=[]]}

went down unexpectedly
java.io.IOException: End of input detected. Close the session.
at org.opendaylight.protocol.pcep.impl.PCEPSessionImpl.endOfInput(PCEPSessionImpl.java:266)[268:org.opendaylight.bgpcep.pcep-impl:0.4.4.Lithium-SR4]
at org.opendaylight.protocol.framework.AbstractProtocolSession.channelInactive(AbstractProtocolSession.java:40)[54:org.opendaylight.controller.protocol-framework:0.6.4.Lithium-SR4]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)[51:io.netty.transport:4.0.26.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223)[51:io.netty.transport:4.0.26.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:294)[73:io.netty.codec:4.0.26.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)[51:io.netty.transport:4.0.26.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223)[51:io.netty.transport:4.0.26.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:294)[73:io.netty.codec:4.0.26.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)[51:io.netty.transport:4.0.26.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223)[51:io.netty.transport:4.0.26.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:829)[51:io.netty.transport:4.0.26.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:610)[51:io.netty.transport:4.0.26.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)[52:io.netty.common:4.0.26.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)[51:io.netty.transport:4.0.26.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)[52:io.netty.common:4.0.26.Final]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)[52:io.netty.common:4.0.26.Final]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_95]
2016-04-08 02:10:48,749 | WARN | oupCloseable-6-1 | DefaultChannelPipeline | 52 - io.netty.common - 4.0.26.Final | An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
java.lang.IllegalStateException: New transaction PingPongTransaction

{delegate=org.opendaylight.controller.cluster.databroker.DOMBrokerReadWriteTransaction@4abfcdd1}

raced with transacion PingPongTransaction

{delegate=org.opendaylight.controller.cluster.databroker.DOMBrokerReadWriteTransaction@655a0f70}

at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.slowAllocateTransaction(PingPongTransactionChain.java:136)[167:org.opendaylight.controller.sal-broker-impl:1.2.4.Lithium-SR4]
at org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.allocateTransaction(PingPongTransactionChain.java:148)[167:org.opendaylight.controller.sal-broker-impl:1.2.4.Lithium-SR4]

Comment by Ajay Chhabria [ 11/Apr/16 ]

Attachment pcep_session_down.log has been added with description: Karaf Log

Comment by Ajay Chhabria [ 11/Apr/16 ]

Adding one more snapshot to this bug. When a LSP update request without SRP object was sent from the controller it was waiting for a response.

Comment by Ajay Chhabria [ 11/Apr/16 ]

Attachment PCEP_UPDATE_without_SRP.jpg has been added with description: LSP update request without SRP object waiting for a response

Comment by Milos Fabian [ 11/Apr/16 ]

(In reply to Ajay Chhabria from comment #3)
> Hi Milos,
>
> We changed the controller code to not send the SRP object. We wanted to
> generate PCEP error type 6 and value 10. Find the snapshot of the log that
> shows the PCEP session went down unexpectedly. I am attaching the entire
> karaf file as well to this bug. Let me know if you would need any further
> information.
>
> 2016-04-08 02:10:48,747 | WARN | oupCloseable-6-1 |
> AbstractTopologySessionListener | 282 -
> org.opendaylight.bgpcep.pcep-topology-provider - 0.4.4.Lithium-SR4 | Session
> PCEPSessionImpl

{channel=[id: 0xc238978e, /10.18.132.98:57951 :> > /10.18.162.90:4189], localOpen=Open [_deadTimer=120, _keepalive=30, > _sessionId=0, _tlvs=Tlvs [augmentation=[Tlvs1 [_stateful=Stateful > [_lspUpdateCapability=true, augmentation=[Stateful1 [_initiation=true]]]], > Tlvs1 [_srPceCapability=SrPceCapability [_msd=0, augmentation=[]]]]], > augmentation=[]], remoteOpen=Open [_deadTimer=30, _keepalive=5, > _sessionId=1, _tlvs=Tlvs [augmentation=[Tlvs1 [_stateful=Stateful > [_lspUpdateCapability=true, augmentation=[Stateful1 [_initiation=true]]]], > Tlvs1 []]], _version=ProtocolVersion [_value=1], _ignore=false, > _processingRule=false, augmentation=[]]}

went down unexpectedly
> java.io.IOException: End of input detected. Close the session.
> at
> org.opendaylight.protocol.pcep.impl.PCEPSessionImpl.
> endOfInput(PCEPSessionImpl.java:266)[268:org.opendaylight.bgpcep.pcep-impl:0.
> 4.4.Lithium-SR4]
> at
> org.opendaylight.protocol.framework.AbstractProtocolSession.
> channelInactive(AbstractProtocolSession.java:40)[54:org.opendaylight.
> controller.protocol-framework:0.6.4.Lithium-SR4]
> at
> io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelInactive(AbstractChannelHandlerContext.java:237)[51:io.netty.
> transport:4.0.26.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.
> fireChannelInactive(AbstractChannelHandlerContext.java:223)[51:io.netty.
> transport:4.0.26.Final]
> at
> io.netty.handler.codec.ByteToMessageDecoder.
> channelInactive(ByteToMessageDecoder.java:294)[73:io.netty.codec:4.0.26.
> Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelInactive(AbstractChannelHandlerContext.java:237)[51:io.netty.
> transport:4.0.26.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.
> fireChannelInactive(AbstractChannelHandlerContext.java:223)[51:io.netty.
> transport:4.0.26.Final]
> at
> io.netty.handler.codec.ByteToMessageDecoder.
> channelInactive(ByteToMessageDecoder.java:294)[73:io.netty.codec:4.0.26.
> Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelInactive(AbstractChannelHandlerContext.java:237)[51:io.netty.
> transport:4.0.26.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.
> fireChannelInactive(AbstractChannelHandlerContext.java:223)[51:io.netty.
> transport:4.0.26.Final]
> at
> io.netty.channel.DefaultChannelPipeline.
> fireChannelInactive(DefaultChannelPipeline.java:829)[51:io.netty.transport:4.
> 0.26.Final]
> at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:
> 610)[51:io.netty.transport:4.0.26.Final]
> at
> io.netty.util.concurrent.SingleThreadEventExecutor.
> runAllTasks(SingleThreadEventExecutor.java:357)[52:io.netty.common:4.0.26.
> Final]
> at
> io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)[51:io.netty.
> transport:4.0.26.Final]
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.
> run(SingleThreadEventExecutor.java:111)[52:io.netty.common:4.0.26.Final]
> at
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.
> run(DefaultThreadFactory.java:137)[52:io.netty.common:4.0.26.Final]
> at java.lang.Thread.run(Thread.java:745)[:1.7.0_95]
> 2016-04-08 02:10:48,749 | WARN | oupCloseable-6-1 | DefaultChannelPipeline
> | 52 - io.netty.common - 4.0.26.Final | An exceptionCaught() event was
> fired, and it reached at the tail of the pipeline. It usually means the last
> handler in the pipeline did not handle the exception.
> java.lang.IllegalStateException: New transaction
> PingPongTransaction

{delegate=org.opendaylight.controller.cluster.databroker. > DOMBrokerReadWriteTransaction@4abfcdd1}

raced with transacion
> PingPongTransaction

{delegate=org.opendaylight.controller.cluster.databroker. > DOMBrokerReadWriteTransaction@655a0f70}

> at
> org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.
> slowAllocateTransaction(PingPongTransactionChain.java:136)[167:org.
> opendaylight.controller.sal-broker-impl:1.2.4.Lithium-SR4]
> at
> org.opendaylight.controller.md.sal.dom.broker.impl.PingPongTransactionChain.
> allocateTransaction(PingPongTransactionChain.java:148)[167:org.opendaylight.
> controller.sal-broker-impl:1.2.4.Lithium-SR4]

The log says the session was terminated because of expired dead-timer, which mean the ODL PCE did not received any message from PCC for some time (30s in this case), however something goes wrong at closing as there are some exceptions cumming from MD-SAL.
Could you check what is happening on the router (PCC) side?

Comment by Milos Fabian [ 11/Apr/16 ]

(In reply to Ajay Chhabria from comment #5)
> Adding one more snapshot to this bug. When a LSP update request without SRP
> object was sent from the controller it was waiting for a response.

This infinite waiting problem is tracked in https://bugs.opendaylight.org/show_bug.cgi?id=5612

Comment by Ajay Chhabria [ 12/Apr/16 ]

Thank You Milos for the comments. I now repeated the test without configuring the dead timer on the XRv. I think the default value is 120 seconds. I now see the following exception:

2016-04-12 11:20:12,008 | INFO | oupCloseable-5-3 | AbstractTopologySessionListener | 282 - org.opendaylight.bgpcep.pcep-topology-provider - 0.4.4.Lithium-SR4 | Unhandled message Pcerr [_pcerrMessage=PcerrMessage [_errors=[Errors [_errorObject=ErrorObject [_type=6, _value=10, _ignore=false, _processingRule=false, augmentation=[]], augmentation=[]]], augmentation=[]], augmentation=[]] on session PCEPSessionImpl

{channel=[id: 0xfafea19e, /10.18.132.94:60011 => /10.18.160.123:4189], localOpen=Open [_deadTimer=120, _keepalive=30, _sessionId=1, _tlvs=Tlvs [augmentation=[Tlvs1 [_srPceCapability=SrPceCapability [_msd=0, augmentation=[]]], Tlvs1 [_stateful=Stateful [_lspUpdateCapability=true, augmentation=[Stateful1 [_initiation=true]]]]]], augmentation=[]], remoteOpen=Open [_deadTimer=120, _keepalive=30, _sessionId=2, _tlvs=Tlvs [augmentation=[Tlvs1 [], Tlvs1 [_stateful=Stateful [_lspUpdateCapability=true, augmentation=[Stateful1 [_initiation=true]]]]]], _version=ProtocolVersion [_value=1], _ignore=false, _processingRule=false, augmentation=[]]}

2016-04-12 11:20:21,728 | INFO | oupCloseable-5-2 | AbstractSessionNegotiator | 54 - org.opendaylight.controller.protocol-framework - 0.6.4.Lithium-SR4 | Failed to send message Pcerr [_pcerrMessage=PcerrMessage [_errors=[Errors [_errorObject=ErrorObject [_type=1, _value=2, augmentation=[]], augmentation=[]]], augmentation=[]], augmentation=[]]
java.nio.channels.ClosedChannelException
2016-04-12 11:20:21,730 | WARN | oupCloseable-5-2 | DefaultPromise | 52 - io.netty.common - 4.0.26.Final | An exception was thrown by org.opendaylight.protocol.framework.AbstractSessionNegotiator$1.operationComplete()
java.lang.IllegalStateException: complete already: DefaultPromise@19303a72(failure(java.io.IOException: Connection reset by peer)
at io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:418)[52:io.netty.common:4.0.26.Final]
at org.opendaylight.protocol.framework.AbstractSessionNegotiator.negotiationFailed(AbstractSessionNegotiator.java:53)[54:org.opendaylight.controller.protocol-framework:0.6.4.Lithium-SR4]
at org.opendaylight.protocol.framework.AbstractSessionNegotiator$1.operationComplete(AbstractSessionNegotiator.java:69)[54:org.opendaylight.controller.protocol-framework:0.6.4.Lithium-SR4]
at org.opendaylight.protocol.framework.AbstractSessionNegotiator$1.operationComplete(AbstractSessionNegotiator.java:64)[54:org.opendaylight.controller.protocol-framework:0.6.4.Lithium-SR4]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)[52:io.netty.common:4.0.26.Final]
at io.netty.util.concurrent.DefaultPromise.notifyLateListener(DefaultPromise.java:621)[52:io.netty.common:4.0.26.Final]
at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:138)[52:io.netty.common:4.0.26.Final]
at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:93)[51:io.netty.transport:4.0.26.Final]
at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:28)[51:io.netty.transport:4.0.26.Final]
at org.opendaylight.protocol.framework.AbstractSessionNegotiator.sendMessage(AbstractSessionNegotiator.java:63)[54:org.opendaylight.controller.protocol-framework:0.6.4.Lithium-SR4]
at org.opendaylight.protocol.pcep.impl.AbstractPCEPSessionNegotiator.sendErrorMessage(AbstractPCEPSessionNegotiator.java:145)[268:org.opendaylight.bgpcep.pcep-impl:0.4.4.Lithium-SR4]
at org.opendaylight.protocol.pcep.impl.AbstractPCEPSessionNegotiator.access$100(AbstractPCEPSessionNegotiator.java:45)[268:org.opendaylight.bgpcep.pcep-impl:0.4.4.Lithium-SR4]
at org.opendaylight.protocol.pcep.impl.AbstractPCEPSessionNegotiator$1.run(AbstractPCEPSessionNegotiator.java:167)[268:org.opendaylight.bgpcep.pcep-impl:0.4.4.Lithium-SR4]
at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)[52:io.netty.common:4.0.26.Final]
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)[52:io.netty.common:4.0.26.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)[52:io.netty.common:4.0.26.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)[51:io.netty.transport:4.0.26.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)[52:io.netty.common:4.0.26.Final]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)[52:io.netty.common:4.0.26.Final]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_75]
Caused by: java.nio.channels.ClosedChannelException

Comment by Milos Fabian [ 13/Apr/16 ]

The PCEP-ERROR Type=1 Value=2 -> "no Open message received before the expiration of the OpenWait timer".
Looks like something wrong is going on there...

I will try to reproduce it with the latest Beryllium.

Comment by Milos Fabian [ 19/Apr/16 ]

The problem happens when ODL PCE receive unhanded message (i.e. Error message w/o SRP) -> new transaction is started but never submitted, hence it cause race-condition between transactions.

Comment by Milos Fabian [ 19/Apr/16 ]

The session is closed by remote end. The exception mentioned above, is risen when session goes down.

Comment by Milos Fabian [ 19/Apr/16 ]

master: https://git.opendaylight.org/gerrit/#/c/37806/

Comment by Kevin Wang [ 11/May/16 ]

Beryllium cherry-pick
https://git.opendaylight.org/gerrit/#/c/38308/2

Generated at Wed Feb 07 19:13:02 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.