[MDSAL-87] possible deadlocks causing slow feature installations Created: 06/Jun/15  Updated: 09/Mar/18  Resolved: 22/Feb/16

Status: Resolved
Project: mdsal
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Jamo Luhrsen Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 3605

 Description   

There have been several complaints on the email lists about a deadlockmonitor warning in the karaf.log. It seems when this happens during feature install, it will take an additional 90 seconds for the deadlockmonitor to restart the service after which point the system does seem to operate properly.

this provides a lesser user experience when it happens.

There may be many/several different flavors of this, but one easy way to
recreate it with a deadlock warning in "distribute-config-datastore-provider":

  • unzip the Lithium RCO bits
  • start karaf with the clean option
  • install the Helium based openflowplugin (feature:install odl-openflowplugin-flow-services-ui")
  • <optional> you can connect mininet to verify the plugin is ok
  • logout of karaf
  • start karaf with the clean option
  • install the Lithium based openflowplugin (feature:install odl-openflowplugin-flow-services-ui-li")
  • monitor the data/log/karaf.log file for the DeadlockMonitor
  • you should see something similar to the below.

2015-06-06 09:39:12,989 | WARN | saction-52-54'}} | DeadlockMonitor | 113 - org.openday
light.controller.config-manager - 0.3.0.Lithium-RC0 | ModuleIdentifier

{factoryName='distributed-config-da tastore-provider', instanceName='distributed-config-store-module'}

did not finish after 89986 ms
2015-06-06 09:39:13,781 | ERROR | config-pusher | DistributedDataStore | 179 - org.openday
light.controller.sal-distributed-datastore - 1.2.0.Lithium-RC0 | Shared leaders failed to settle in 90 se
conds, giving up

As a side note, I have seen this in "binding-broker-impl", although I don't have exact steps to
reproduce that one.

2015-06-02 09:16:46,834 | WARN | saction-50-52'}} | DeadlockMonitor | 113 - org.opendaylight.controller.
config-manager - 0.3.0.SNAPSHOT | ModuleIdentifier

{factoryName='binding-broker-impl', instanceName='binding-broker-impl'}

did not finish after 89990 ms

It's also been reported in "statistics-manager" on the controller-dev email list:

2015-05-26 11:01:34,017 | WARN | saction-33-35'}} | kMonitor$DeadlockMonitorRunnable | 151 - org.opendaylight.controller.config-manager - 0.2.8.Helium-SR3 | ModuleIdentifier

{factoryName='statistics-manager', instanceName='statistics-manager'}

did not finish after 10984 ms



 Comments   
Comment by Jamo Luhrsen [ 06/Jun/15 ]

I was not sure what Component to use, so I choose karaf which feels just as wrong as the other options. ???

Comment by Jamo Luhrsen [ 11/Jun/15 ]

one comment on controller-dev list had some clues:

https://lists.opendaylight.org/pipermail/controller-dev/2015-June/009605.html

Comment by Robert Varga [ 13/Nov/15 ]

Moving to MD-SAL as per the mailinglist comment.

Comment by rui hu [ 11/Dec/15 ]

I also encountered such a problem in the lithium version, I do not know this problem, whether in the follow-up version of lithium has been repaired, who can help me, thank you

Comment by Jamo Luhrsen [ 11/Dec/15 ]

I just re-tested this with Lithium-SR3 [0] and did not see the deadlock issue.

[0] https://nexus.opendaylight.org/content/repositories/opendaylight.release/org/opendaylight/integration/distribution-karaf/0.3.3-Lithium-SR3/distribution-karaf-0.3.3-Lithium-SR3.zip

Comment by Jamo Luhrsen [ 11/Dec/15 ]

(In reply to Jamo Luhrsen from comment #5)
> I just re-tested this with Lithium-SR3 [0] and did not see the deadlock
> issue.
>
>
> [0]
> https://nexus.opendaylight.org/content/repositories/opendaylight.release/org/
> opendaylight/integration/distribution-karaf/0.3.3-Lithium-SR3/distribution-
> karaf-0.3.3-Lithium-SR3.zip

I have also tested with master (Beryllium distro from 12/10/2015) and did not
see this.

Comment by Jamo Luhrsen [ 11/Dec/15 ]

(In reply to rui hu from comment #4)
> I also encountered such a problem in the lithium version, I do not know this
> problem, whether in the follow-up version of lithium has been repaired, who
> can help me, thank you

Rui, which Lithium version are you using? Would you be able to try with SR3 and report back?

Generated at Wed Feb 07 20:08:36 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.