Uploaded image for project: 'mdsal'
  1. mdsal
  2. MDSAL-275

VerifyException from flapping service singleton registration

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • None
    • None
    • None
    • None
    • Operating System: All
      Platform: All

    • 8858

      The "flapping service" is a simple test service implemented using singleton service. Upon creation, flapping service instance tries to de-register flapping service from the current member, and upon close, the instance tries to re-register on the current member again.

      The robot test registers the flapping service on all members, waits a while, and then un-registers all members. This is also running as a longevity job.

      This week, the longevity job failed [0], after 22 hours (after 175814 successful de-registrations) in the re-registration phase. Karaf log [1] shows VerifyException:

      2017-07-16 21:10:57,018 | WARN | pool-30-thread-1 | FlappingSingletonService | 257 - org.opendaylight.controller.samples.clustering-it-provider - 1.5.2.SNAPSHOT | There was a problem re-registering flapping singleton service.java.lang.RuntimeException: com.google.common.base.VerifyException
      at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.registerService(ClusterSingletonServiceGroupImpl.java:180)
      at org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl.registerClusterSingletonService(AbstractClusterSingletonServiceProviderImpl.java:107)
      at Proxy057965cf_73f3_4f46_a035_5d06b8a2497b.registerClusterSingletonService(Unknown Source)
      at Proxy1a1b6ecc_a451_4f57_8a21_81dd42385c83.registerClusterSingletonService(Unknown Source)
      at org.opendaylight.controller.clustering.it.provider.impl.FlappingSingletonService.lambda$closeServiceInstance$1(FlappingSingletonService.java:78)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)[:1.8.0_131]
      ...

      Garbage collection log [2] shows there was a small GC pause just before, not sure how that could be related.

      2017-07-16T21:10:56.605+0000: 78867.300: [GC (Allocation Failure) [PSYoungGen: 598048K->64416K(604160K)] 1060366K->526734K(1238528K), 0.4121114 secs] [Times: user=2.06 sys=0.00, real=0.41 secs]

      [0] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-cs-chasing-leader-longevity-only-carbon/13/log.html.gz#s1-s2-t3-k3-k2-k1-k1-k2-k1-k4-k7-k1
      [1] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-cs-chasing-leader-longevity-only-carbon/13/odl1_karaf.log.gz
      [2] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-cs-chasing-leader-longevity-only-carbon/13/gclogs-1/gc_1500160588.82.log.gz

            rovarga Robert Varga
            vrpolak Vratko Polak
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: