Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1766

The RaftState of one cluster node changed to 'IsolatedLeader' after a large number of Virtual NWs were created.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • None
    • Carbon
    • clustering
    • None
    • Operating System: All
      Platform: All

    • 9090

      After a large number of Virtual NWs were created, the RaftState of one cluster node became abnormal.
      It changed from Leader to IsolatedLeader, Follower to Candidate.
      Then RaftState not changed even after 16 hours.
      And the RaftState of other 2 nodes did not changed at all.

      Environment Details:
      OpenStack Version: stable/ocata
      ODL Version: carbon-FR + L2GW related patches
      ODLのMAX heap: JAVA_MAX_MEM="6144m"

      Reproduction steps:
      1. Initialization and set up of Controller/Compute/L2GW Node
      2. Creation of 1000 virtual NWs
      3. Creation of 1 L2GW GATEWAY
      4. Creation of 1000 L2GW CONNECTIONs

      • This phenomenon does not always occur (occurrence ratio: 1/3)

      Karaf logs:

      • Split Karaf logs are attached.
        The time when RaftState became abnormal is around 17: 20-17: 23.

      odl1#
      karaf.log.3:2017-08-08 13:28:41,123 | INFO | lt-dispatcher-22 | RoleChangeNotifier | 204 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.Carbon | RoleChangeNotifier:akka.tcp://opendaylight-cluster-data@192.168.101.8:2550/user/shardmanager-operational/member-1-shard-topology-operational/member-1-shard-topology-operational-notifier#1974074480 created and ready for shard:member-1-shard-topology-operational
      karaf.log.3:2017-08-08 13:28:41,131 | INFO | lt-dispatcher-23 | RoleChangeNotifier | 204 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.Carbon | RoleChangeNotifier for member-1-shard-topology-operational , received role change from null to Follower
      karaf.log.3:2017-08-08 13:28:41,132 | INFO | lt-dispatcher-23 | RoleChangeNotifier | 204 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.Carbon | RoleChangeNotifier for member-1-shard-topology-operational , registered listener akka://opendaylight-cluster-data/user/shardmanager-operational
      karaf.log.3:2017-08-08 13:28:51,162 | INFO | lt-dispatcher-36 | RoleChangeNotifier | 204 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.Carbon | RoleChangeNotifier for member-1-shard-topology-operational , received role change from Follower to Candidate
      karaf.log.3:2017-08-08 13:28:51,165 | INFO | lt-dispatcher-41 | RoleChangeNotifier | 204 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.Carbon | RoleChangeNotifier for member-1-shard-topology-operational , received role change from Candidate to Leader
      karaf.log:2017-08-08 17:20:35,381 | INFO | lt-dispatcher-45 | RoleChangeNotifier | 204 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.Carbon | RoleChangeNotifier for member-1-shard-topology-operational , received role change from Leader to IsolatedLeader

      odl2#
      karaf.log.1:2017-08-08 13:28:41,124 | INFO | lt-dispatcher-23 | RoleChangeNotifier | 204 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.Carbon | RoleChangeNotifier:akka.tcp://opendaylight-cluster-data@192.168.101.13:2550/user/shardmanager-operational/member-2-shard-topology-operational/member-2-shard-topology-operational-notifier#-1501266548 created and ready for shard:member-2-shard-topology-operational
      karaf.log.1:2017-08-08 13:28:41,130 | INFO | ult-dispatcher-3 | RoleChangeNotifier | 204 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.Carbon | RoleChangeNotifier for member-2-shard-topology-operational , received role change from null to Follower
      karaf.log.1:2017-08-08 13:28:41,138 | INFO | ult-dispatcher-3 | RoleChangeNotifier | 204 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.Carbon | RoleChangeNotifier for member-2-shard-topology-operational , registered listener akka://opendaylight-cluster-data/user/shardmanager-operational
      karaf.log:Caused by: akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@192.168.101.8:2550/), Path(/user/shardmanager-operational/member-1-shard-topology-operational/shard-topology-member-2:datastore-operational@0:89656_200508#2123488974)]] after [5000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.datastore.messages.ReadData".
      karaf.log:ReadFailedException{message=Error checking ReadData for path /(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)network-topology/topology/topology[

      {(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)topology-id=hwvtep:1}

      ]/node/node[

      {(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=hwvtep://uuid/44307ca2-4ebf-4470-8d06-7e377959a41c}

      ]/AugmentationIdentifier

      {childNames=[(urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)remote-ucast-macs, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)managers, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)local-mcast-macs, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)switches, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)logical-routers, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)remote-arp-sources, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)local-ucast-macs, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)logical-binding-stats, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)connection-info, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)acls, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)local-arp-sources, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)remote-mcast-macs, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)logical-switches]}

      /(urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)logical-switches/logical-switches[

      {(urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)hwvtep-node-name=a40b416f-18c8-43a8-9466-7759d62e4ce4}

      ], errorList=[RpcError [message=Error checking ReadData for path /(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)network-topology/topology/topology[

      {(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)topology-id=hwvtep:1}

      ]/node/node[

      {(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=hwvtep://uuid/44307ca2-4ebf-4470-8d06-7e377959a41c}

      ]/AugmentationIdentifier

      {childNames=[(urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)remote-ucast-macs, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)managers, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)local-mcast-macs, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)switches, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)logical-routers, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)remote-arp-sources, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)local-ucast-macs, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)logical-binding-stats, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)connection-info, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)acls, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)local-arp-sources, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)remote-mcast-macs, (urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)logical-switches]}

      /(urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)logical-switches/logical-switches[

      {(urn:opendaylight:params:xml:ns:yang:ovsdb:hwvtep?revision=2015-09-01)hwvtep-node-name=a40b416f-18c8-43a8-9466-7759d62e4ce4}

      ], severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://opendaylight-cluster-data@192.168.101.8:2550/), Path(/user/shardmanager-operational/member-1-shard-topology-operational/shard-topology-member-2:datastore-operational@0:89656_200508#2123488974)]] after [5000 ms]. Sender[null] sent message of type "org.opendaylight.controller.cluster.datastore.messages.ReadData".]]}

      odl3#
      2017-08-08 13:28:41,130 | INFO | lt-dispatcher-20 | RoleChangeNotifier | 204 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.Carbon | RoleChangeNotifier:akka.tcp://opendaylight-cluster-data@192.168.101.15:2550/user/shardmanager-operational/member-3-shard-topology-operational/member-3-shard-topology-operational-notifier#1779121060 created and ready for shard:member-3-shard-topology-operational
      2017-08-08 13:28:41,142 | INFO | lt-dispatcher-22 | RoleChangeNotifier | 204 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.Carbon | RoleChangeNotifier for member-3-shard-topology-operational , received role change from null to Follower
      2017-08-08 13:28:41,142 | INFO | lt-dispatcher-48 | RoleChangeNotifier | 204 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.Carbon | RoleChangeNotifier for member-3-shard-topology-operational , registered listener akka://opendaylight-cluster-data/user/shardmanager-operational

        1. karaf_odl1_20170921085857.tar.xz.aa
          1000 kB
        2. karaf_odl1_20170921085857.tar.xz.ab
          1000 kB
        3. karaf_odl1_20170921085857.tar.xz.ac
          735 kB
        4. karaf_odl2_20170921090305.tar.xz.aa
          1000 kB
        5. karaf_odl2_20170921090305.tar.xz.ab
          545 kB
        6. karaf_odl3_20170921091358.tar.xz.aa
          1000 kB
        7. karaf_odl3_20170921091358.tar.xz.ab
          487 kB
        8. other.tar.xz
          42 kB
        9. xaa
          977 kB
        10. xab
          977 kB
        11. xac
          977 kB
        12. xad
          977 kB
        13. xae
          977 kB
        14. xaf
          598 kB

            Unassigned Unassigned
            ran-xiao@vf.jp.nec.com Ran Xiao
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: