Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1613

Clustering: Member fails to re-start sometimes in csit -all- jobs

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • None
    • None
    • clustering
    • None
    • Operating System: All
      Platform: All

    • 7858

      The first symptom detected by Robot suites is constant 404 on jolokia URL. This Carbon bug does not happen in only jobs. This weekend it hit both Netconf and Controller jobs, previously this was only seen once [0].

      Karaf.log for this Bug contains reports on runtime-generated-mapping not finishing, followed by "giving up" from AbstractDataStore:

      2017-02-26 04:02:37,092 | WARN | saction-32-34'}} | DeadlockMonitor | 131 - org.opendaylight.controller.config-manager - 0.6.0.SNAPSHOT | ModuleIdentifier

      {factoryName='runtime-generated-mapping', instanceName='runtime-mapping-singleton'}

      did not finish after 169982 ms
      2017-02-26 04:02:40,606 | ERROR | Event Dispatcher | AbstractDataStore | 216 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.SNAPSHOT | Shard leaders failed to settle in 90 seconds, giving up

      Possibly this is just a performance Bug (startup taking longer than expected), but when the restart succeeds [1], the instance is created in around 10 seconds. Most probably some ODL project tends to make WaitingServiceTracker not find BindingToNormalizedNodeCodec.

      [0] https://jenkins.opendaylight.org/releng/view/controller/job/controller-csit-3node-clustering-all-carbon/221/
      [1] https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-3node-clustering-all-carbon/199/

            andrej.mak@pantheon.tech Andrej Mak
            vrpolak Vratko Polak
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: