[CONTROLLER-1615] prevLogIndex X was found in the log but the term -1 is not equal to the append Created: 20/Mar/17 Updated: 25/Jul/23 Resolved: 23/Mar/17 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Guy Sela | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
| External issue ID: | 8022 |
| Description |
|
Running with Carbon code of 20/03/2017. |
| Comments |
| Comment by Guy Sela [ 20/Mar/17 ] |
|
Attachment odl1.zip has been added with description: Logs of ODL1 |
| Comment by Guy Sela [ 20/Mar/17 ] |
|
Attachment odl2.zip has been added with description: Logs of ODL2 |
| Comment by Guy Sela [ 20/Mar/17 ] |
|
Attachment odl3.zip has been added with description: Logs of ODL3 |
| Comment by Guy Sela [ 20/Mar/17 ] |
|
{"request": {"mbean":"org.opendaylight.controller:Category=Shards,name=member-1-shard-topology-operational,type=DistributedOperationalDatastore","type":"read"},"value": {"ReadWriteTransactionCount":0,"SnapshotIndex":59,"InMemoryJournalLogSize":1,"ReplicatedToAllIndex":59,"Leader":"member-3-shard-topology-operational","LastIndex":60,"RaftState":"Follower","LastApplied":60,"LastCommittedTransactionTime":"1970-01-01 02:00:00.000","LastLeadershipChangeTime":"2017-03-20 15:33:17.204","LastLogIndex":60,"PeerAddresses":"member-2-shard-topology-operational: , member-3-shard-topology-operational: akka.tcp:\/\/opendaylight-cluster-data@172.16.1.153:2550\/user\/shardmanager-operational\/member-3-shard-topology-operational","FollowerInitialSyncStatus":true,"WriteOnlyTransactionCount":0,"FollowerInfo":[],"FailedReadTransactionsCount":0,"Voting":true,"StatRetrievalTime":"351.2 μs","CurrentTerm":1,"LastTerm":1,"FailedTransactionsCount":0,"PendingTxCommitQueueSize":0,"VotedFor":"member-3-shard-topology-operational","SnapshotCaptureInitiated":false,"CommittedTransactionsCount":0,"TxCohortCacheSize":0,"PeerVotingStates":"member-2-shard-topology-operational: true, member-3-shard-topology-operational: true","LastLogTerm":1,"StatRetrievalError":null,"CommitIndex":60,"SnapshotTerm":1,"AbortTransactionsCount":0,"ReadOnlyTransactionCount":0,"ShardName":"member-1-shard-topology-operational","LeadershipChangeCount":1,"InMemoryJournalDataSize":41},"timestamp":1490018472,"status":200} {"mbean":"org.opendaylight.controller:Category=Shards,name=member-2-shard-topology-operational,type=DistributedOperationalDatastore","type":"read"},"value": {"ReadWriteTransactionCount":0,"SnapshotIndex":59,"InMemoryJournalLogSize":1,"ReplicatedToAllIndex":59,"Leader":"member-3-shard-topology-operational","LastIndex":60,"RaftState":"Follower","LastApplied":60,"LastCommittedTransactionTime":"1970-01-01 02:00:00.000","LastLogIndex":60,"LastLeadershipChangeTime":"2017-03-20 15:33:17.203","PeerAddresses":"member-1-shard-topology-operational: , member-3-shard-topology-operational: akka.tcp:\/\/opendaylight-cluster-data@172.16.1.153:2550\/user\/shardmanager-operational\/member-3-shard-topology-operational","WriteOnlyTransactionCount":0,"FollowerInitialSyncStatus":true,"FollowerInfo":[],"FailedReadTransactionsCount":0,"Voting":true,"StatRetrievalTime":"2.887 ms","CurrentTerm":1,"LastTerm":1,"FailedTransactionsCount":0,"PendingTxCommitQueueSize":0,"VotedFor":"member-3-shard-topology-operational","SnapshotCaptureInitiated":false,"CommittedTransactionsCount":0,"TxCohortCacheSize":0,"PeerVotingStates":"member-1-shard-topology-operational: true, member-3-shard-topology-operational: true","LastLogTerm":1,"StatRetrievalError":null,"CommitIndex":60,"SnapshotTerm":1,"AbortTransactionsCount":0,"ReadOnlyTransactionCount":0,"ShardName":"member-2-shard-topology-operational","LeadershipChangeCount":1,"InMemoryJournalDataSize":41},"timestamp":1490018505,"status":200} {"request": {"mbean":"org.opendaylight.controller:Category=Shards,name=member-3-shard-topology-operational,type=DistributedOperationalDatastore","type":"read"},"value":{"ReadWriteTransactionCount":2,"SnapshotIndex":59,"InMemoryJournalLogSize":1347,"ReplicatedToAllIndex":59,"Leader":"member-3-shard-topology-operational","LastIndex":1397,"RaftState":"Leader","LastApplied":60,"LastCommittedTransactionTime":"2017-03-20 15:33:27.493","LastLogIndex":1397,"LastLeadershipChangeTime":"2017-03-20 15:33:17.163","PeerAddresses":"member-1-shard-topology-operational: akka.tcp:\/\/opendaylight-cluster-data@172.16.1.142:2550\/user\/shardmanager-operational\/member-1-shard-topology-operational, member-2-shard-topology-operational: akka.tcp:\/\/opendaylight-cluster-data@172.16.1.136:2550\/user\/shardmanager-operational\/member-2-shard-topology-operational","WriteOnlyTransactionCount":0,"FollowerInitialSyncStatus":false,"FollowerInfo":[ {"timeSinceLastActivity":"00:00:00.030","active":true,"matchIndex":60,"voting":true,"id":"member-1-shard-topology-operational","nextIndex":61}, {"timeSinceLastActivity":"00:00:00.036","active":true,"matchIndex":60,"voting":true,"id":"member-2-shard-topology-operational","nextIndex":61}],"FailedReadTransactionsCount":0,"Voting":true,"StatRetrievalTime":"9.821 ms","CurrentTerm":1,"LastTerm":1,"FailedTransactionsCount":0,"PendingTxCommitQueueSize":1,"VotedFor":"member-3-shard-topology-operational","SnapshotCaptureInitiated":false,"CommittedTransactionsCount":3,"TxCohortCacheSize":1,"PeerVotingStates":"member-1-shard-topology-operational: true, member-2-shard-topology-operational: true","LastLogTerm":1,"StatRetrievalError":null,"CommitIndex":60,"SnapshotTerm":1,"AbortTransactionsCount":0,"ReadOnlyTransactionCount":2219,"ShardName":"member-3-shard-topology-operational","LeadershipChangeCount":1,"InMemoryJournalDataSize":56425},"timestamp":1490018533,"status":200} |
| Comment by Guy Sela [ 20/Mar/17 ] |
|
There is another error in the logs, that is also found in the Jenkins runs, and might be related: /10.29.12.25:2550] at org.opendaylight.controller.md.sal.dom.broker.impl.TransactionCommitFailedExceptionMapper.newWithCause(TransactionCommitFailedExceptionMapper.java:37)[190:org.opendaylight.controller.sal-broker-impl:1.5.0.SNAPSHOT] |
| Comment by Tom Pantelis [ 20/Mar/17 ] |
|
More info from email: 2017-03-20 13:48:36,101 | INFO | lt-dispatcher-29 | Shard | 207 - org.opendaylight.controller.sal-clustering-commons - 1.5.0.SNAPSHOT | member-3-shard-topology-operational (Follower): The prevLogIndex 3 was found in the log but the term -1 is not equal to the append entriesprevLogTerm 1 - lastIndex: 5, snapshotIndex: 4 At that time member-3 was a follower but from the jolokia output member-3 is now the leader so something changed and we don't have the shard state from when the above messages were emitted. It would also help if you could provide the specific steps that led up the above messages at 2017-03-20 13:48:36. Specifically were nodes restarted after they were originally started and/or was any isolation testing done. |
| Comment by Guy Sela [ 20/Mar/17 ] |
|
Nothing was restarted - not the ODLs and not the OVSs. |
| Comment by Vratko Polak [ 21/Mar/17 ] |
|
In some tests, this symptom stops appearing with this [0] patch, but other symptoms appear (in general there are less csit failures). |
| Comment by Guy Sela [ 22/Mar/17 ] |
|
BFD was on, and probably caused a lot of "noise" on the system. |
| Comment by Vratko Polak [ 23/Mar/17 ] |
|
Marking as fixed, as I have not seen this in recent tests. |