-
Bug
-
Resolution: Done
-
None
-
None
-
Operating System: All
Platform: All
-
7846
After running ODL for few days, it seems some internal queue gets fulls and SalService does not allow to perform any action on the switch. The connectivity with the switch is established and the hello messages from the switch are returned properly.
Restarting the switch does not solve the problem, the controller that holds the mastership does not free the cluster singleton service. The rest of the controller that holds "candidate" status on the cluster singleton (slave mode in the switch) release the singleton service.
The problem was solved only after restarting the controller that holds the mastership to the switch.
I noticed exceptions that may point to the source of the problem. Basically, following code does not return a requestContext . (requestContext is always null).
final RequestContext<O> requestContext = requestContextStack.createRequestContext();
if (requestContext == null)
And following logs points that the queue is full.
2017-02-24 09:47:34,239 | TRACE | Thread-101 | AbstractService | 287 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Handling general service call
2017-02-24 09:47:34,239 | TRACE | Thread-101 | RpcContextImpl | 287 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Device queue org.opendaylight.openflowplugin.i mpl.rpc.RpcContextImpl@3a502d68 at capacity
2017-02-24 09:47:34,239 | TRACE | Thread-101 | AbstractService | 287 - org.opendaylight.openflowplugin.impl - 0.3.1.Boron-SR1 | Request context refused.
The log "Device queue org.opendaylight.openflowplugin.i mpl.rpc.RpcContextImpl@3a502d68 at capacity" is returned because following code cannot acquire the lock which is a semaphore.
public <T> RequestContext<T> createRequestContext() {
if (!tracker.tryAcquire()) {
LOG.trace("Device queue {} at capacity", this);
return null;
} else {
LOG.trace("Acquired semaphore for {}, available permits:{} ", nodeInstanceIdentifier.getKey().getId().getValue(), tracker.availablePermits());
}