Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-1686

Shards fail to settle after brief isolation

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Resolution: Cannot Reproduce
    • None
    • None
    • clustering
    • None
    • Operating System: All
      Platform: All

    • 8492

    Description

      This is different from other isolation Bugs in that no major timeout is hit before bad things start to happen. This is also similar to CONTROLLER-1684 in that OutOfOrderRequestException is visible from Robot.

      The scenario is using module-based shards with tell-based protocol, transaction producer on each member and short isolation on the original leader (member-1).
      The first suspicious messages in karaf.log [0] are the repeated:
      2017-05-17 13:28:51,032 | INFO | lt-dispatcher-22 | Shard | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.Carbon | member-1-shard-default-config (Follower): The prevLogIndex 33 was found in the log but the term -1 is not equal to the append entriesprevLogTerm 2 - lastIndex: 36, snapshotIndex: 34
      2017-05-17 13:28:51,032 | INFO | lt-dispatcher-22 | Shard | 192 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.Carbon | member-1-shard-default-config (Follower): Follower is out-of-sync so sending negative reply: AppendEntriesReply [term=4, success=false, followerId=member-1-shard-default-config, logLastIndex=36, logLastTerm=2, forceInstallSnapshot=false, payloadVersion=5, raftVersion=3]

      Robot has gathered failure responses [1] from each member, member-1 sent NullPointerException according to karaf.log:
      2017-05-17 13:29:42,277 | WARN | lt-dispatcher-25 | ConcurrentDOMDataBroker | 199 - org.opendaylight.controller.sal-distributed-datastore - 1.5.0.Carbon | Tx: DOM-CHAIN-1-2 Error during phase CAN_COMMIT, starting Abort
      java.lang.NullPointerException
      at org.opendaylight.controller.cluster.datastore.FrontendReadWriteTransaction.ensureReady(FrontendReadWriteTransaction.java:336)
      at org.opendaylight.controller.cluster.datastore.FrontendReadWriteTransaction.handleModifyTransaction(FrontendReadWriteTransaction.java:319)
      at org.opendaylight.controller.cluster.datastore.FrontendReadWriteTransaction.handleRequest(FrontendReadWriteTransaction.java:90)
      at org.opendaylight.controller.cluster.datastore.AbstractFrontendHistory.handleTransactionRequest(AbstractFrontendHistory.java:154)
      at org.opendaylight.controller.cluster.datastore.LeaderFrontendState.handleTransactionRequest(LeaderFrontendState.java:198)
      at org.opendaylight.controller.cluster.datastore.Shard.handleRequest(Shard.java:461)
      at org.opendaylight.controller.cluster.datastore.Shard.handleNonRaftCommand(Shard.java:292)
      at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:270)

      [0] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/693/archives/odl1_karaf.log.gz
      [1] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/693/archives/log.html.gz#s1-s28-t1-k2-k23-k1-k1

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Unassigned Unassigned
              vrpolak Vratko Polak
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: