Right now PCEP session is not recoverable once there is an exception happening during the session negotiation stage.
The following exception is observed when an failure in netty happens.
2017-08-28 12:08:47,890 | WARN | upCloseable-2-12 | DefaultChannelPipeline | 100 - io.netty.common - 4.0.33.Final | An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
io.netty.channel.ChannelPipelineException: org.opendaylight.protocol.pcep.impl.PCEPSessionImpl.handlerAdded() has thrown an exception; removed.
at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:500)[101:io.netty.transport:4.0.33.Final]
at io.netty.channel.DefaultChannelPipeline.callHandlerAdded(DefaultChannelPipeline.java:482)[101:io.netty.transport:4.0.33.Final]
at io.netty.channel.DefaultChannelPipeline.replace0(DefaultChannelPipeline.java:455)[101:io.netty.transport:4.0.33.Final]
at io.netty.channel.DefaultChannelPipeline.replace(DefaultChannelPipeline.java:405)[101:io.netty.transport:4.0.33.Final]
at io.netty.channel.DefaultChannelPipeline.replace(DefaultChannelPipeline.java:372)[101:io.netty.transport:4.0.33.Final]
at org.opendaylight.protocol.pcep.impl.AbstractSessionNegotiator.negotiationSuccessful(AbstractSessionNegotiator.java:35)[208:org.opendaylight.bgpcep.pcep-impl:0.5.2.Beryllium-SR2]
at org.opendaylight.protocol.pcep.impl.AbstractPCEPSessionNegotiator.handleMessageKeepAlive(AbstractPCEPSessionNegotiator.java:227)[208:org.opendaylight.bgpcep.pcep-impl:0.5.2.Beryllium-SR2]
at org.opendaylight.protocol.pcep.impl.AbstractPCEPSessionNegotiator.handleMessageKeepWait(AbstractPCEPSessionNegotiator.java:216)[208:org.opendaylight.bgpcep.pcep-impl:0.5.2.Beryllium-SR2]
at org.opendaylight.protocol.pcep.impl.AbstractPCEPSessionNegotiator.handleMessage(AbstractPCEPSessionNegotiator.java:344)[208:org.opendaylight.bgpcep.pcep-impl:0.5.2.Beryllium-SR2]
at org.opendaylight.protocol.pcep.impl.AbstractSessionNegotiator.channelRead(AbstractSessionNegotiator.java:78)[208:org.opendaylight.bgpcep.pcep-impl:0.5.2.Beryllium-SR2]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)[101:io.netty.transport:4.0.33.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)[101:io.netty.transport:4.0.33.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:276)[102:io.netty.codec:4.0.33.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:263)[102:io.netty.codec:4.0.33.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)[101:io.netty.transport:4.0.33.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)[101:io.netty.transport:4.0.33.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:276)[102:io.netty.codec:4.0.33.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:354)[102:io.netty.codec:4.0.33.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)[102:io.netty.codec:4.0.33.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)[101:io.netty.transport:4.0.33.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)[101:io.netty.transport:4.0.33.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)[101:io.netty.transport:4.0.33.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)[101:io.netty.transport:4.0.33.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)[101:io.netty.transport:4.0.33.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)[101:io.netty.transport:4.0.33.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)[101:io.netty.transport:4.0.33.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)[101:io.netty.transport:4.0.33.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)[100:io.netty.common:4.0.33.Final]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)[100:io.netty.common:4.0.33.Final]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_76]
Caused by: java.lang.IllegalStateException: Could not register runtime bean in ModuleIdentifier
under name org.opendaylight.controller:ListenerState=12.122.8.10,instanceName=pcep-topology,type=RuntimeBean,moduleFactoryName=pcep-topology-provider
at org.opendaylight.controller.config.manager.impl.jmx.RootRuntimeBeanRegistratorImpl.sanitize(RootRuntimeBeanRegistratorImpl.java:53)[90:org.opendaylight.controller.config-manager:0.4.2.Beryllium-SR2]
at org.opendaylight.controller.config.manager.impl.jmx.HierarchicalRuntimeBeanRegistrationImpl.register(HierarchicalRuntimeBeanRegistrationImpl.java:53)[90:org.opendaylight.controller.config-manager:0.4.2.Beryllium-SR2]
at org.opendaylight.controller.config.manager.impl.jmx.HierarchicalRuntimeBeanRegistrationImpl.register(HierarchicalRuntimeBeanRegistrationImpl.java:19)[90:org.opendaylight.controller.config-manager:0.4.2.Beryllium-SR2]
at org.opendaylight.controller.config.yang.pcep.topology.provider.PCEPTopologyProviderRuntimeRegistration.register(PCEPTopologyProviderRuntimeRegistration.java:10)[221:org.opendaylight.bgpcep.pcep-topology-provider:0.5.2.Beryllium-SR2]
at org.opendaylight.bgpcep.pcep.topology.provider.AbstractTopologySessionListener.register(AbstractTopologySessionListener.java:366)[221:org.opendaylight.bgpcep.pcep-topology-provider:0.5.2.Beryllium-SR2]
at org.opendaylight.bgpcep.pcep.topology.provider.AbstractTopologySessionListener.onSessionUp(AbstractTopologySessionListener.java:185)[221:org.opendaylight.bgpcep.pcep-topology-provider:0.5.2.Beryllium-SR2]
at org.opendaylight.protocol.pcep.impl.PCEPSessionImpl.sessionUp(PCEPSessionImpl.java:373)[208:org.opendaylight.bgpcep.pcep-impl:0.5.2.Beryllium-SR2]
at org.opendaylight.protocol.pcep.impl.PCEPSessionImpl.handlerAdded(PCEPSessionImpl.java:426)[208:org.opendaylight.bgpcep.pcep-impl:0.5.2.Beryllium-SR2]
at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:487)[101:io.netty.transport:4.0.33.Final]
... 29 more
Caused by: javax.management.InstanceAlreadyExistsException: org.opendaylight.controller:ListenerState=12.122.8.10,instanceName=pcep-topology,type=RuntimeBean,moduleFactoryName=pcep-topology-provider
at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437)[:1.7.0_76]
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898)[:1.7.0_76]
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966)[:1.7.0_76]
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900)[:1.7.0_76]
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324)[:1.7.0_76]
at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)[:1.7.0_76]
at org.opendaylight.controller.config.manager.impl.jmx.InternalJMXRegistrator.registerMBean(InternalJMXRegistrator.java:77)[90:org.opendaylight.controller.config-manager:0.4.2.Beryllium-SR2]
at org.opendaylight.controller.config.manager.impl.jmx.HierarchicalRuntimeBeanRegistrationImpl.register(HierarchicalRuntimeBeanRegistrationImpl.java:51)[90:org.opendaylight.controller.config-manager:0.4.2.Beryllium-SR2]
... 36 more
User was having some resource issue when the ODL instance is running. "Too many open files" error happens to netty socket. However, after user fixed the resource issue, some PCEP session hangs there and cannot recover. It is due to the when the exception happens, PCEP topology for that particular session is already created, but not removed after the exception happens. Thus when the next PCEP session retry comes, PCEP topology fails to be recreated, so PCEP session cannot be restored.
Bottom line is, we should capture any exception happening in handlerAdded(..), then close the PCEP session properly, so later it can recover.