[OPNFLWPLUG-1074] table stats not available after a switch flap Created: 18/Jul/19  Updated: 20/Feb/20  Resolved: 20/Feb/20

Status: Resolved
Project: OpenFlowPlugin
Component/s: inventory-manager
Affects Version/s: Neon
Fix Version/s: Sodium-SR3, Magnesium

Type: Bug Priority: High
Reporter: Luis Gomez Assignee: Somashekhar Javalagi
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

The issue was detected here:

https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-sanity-only-neon/

It is not 100% consistent but easier to reproduce with cluster:

1) Connect ovs switch to 3 node cluster
2) flap the connection: sh ovs-vsctl set-controller s1 "tcp:127.0.0.1" && ovs-vsctl set-controller s1 "tcp:10.18.130.167" "tcp:10.18.130.152" "tcp:10.18.130.162
3) observe the table stats are not filled:

{
    "nodes": {
        "node": [
            {
                "id": "openflow:1",
                "opendaylight-group-statistics:group-features": {
                    "group-capabilities-supported": [
                        "opendaylight-group-types:chaining",
                        "opendaylight-group-types:select-liveness",
                        "opendaylight-group-types:select-weight"
                    ],
                    "actions": [
                        67076097
                    ],
                    "max-groups": [
                        4294967040
                    ],
                    "group-types-supported": [
                        "opendaylight-group-types:group-all",
                        "opendaylight-group-types:group-indirect",
                        "opendaylight-group-types:group-select",
                        "opendaylight-group-types:group-ff"
                    ]
                },
                "flow-node-inventory:snapshot-gathering-status-end": {
                    "end": "2019-07-17T21:38:46.765-07:00",
                    "succeeded": true
                },
                "flow-node-inventory:snapshot-gathering-status-start": {
                    "begin": "2019-07-17T21:38:46.752-07:00"
                },
                "flow-node-inventory:port-number": 51978,
                "flow-node-inventory:serial-number": "None",
                "flow-node-inventory:table": [
                    {
                        "id": 235
                    },
                    {
                        "id": 209
                    },
                    {
                        "id": 175
                    },
                    {
                        "id": 89
                    },
                    {
                        "id": 119
                    },
                    {
                        "id": 21
                    },
                    {
                        "id": 51
                    },
                    {
                        "id": 234
                    },
                    {
                        "id": 208
                    },
                    {
                        "id": 174
                    },
                    {
                        "id": 88
                    },
                    {
                        "id": 118
                    },
                    {
                        "id": 20
                    },
                    {
                        "id": 50
                    },
                    {
                        "id": 237
                    },
                    {
                        "id": 207
                    },
                    {
                        "id": 177
                    },
                    {
                        "id": 0
                    },
                    {
                        "id": 87
                    },
                    {
                        "id": 121
                    },
                    {
                        "id": 19
                    },
                    {
                        "id": 53
                    },
                    {
                        "id": 236
                    },
                    {
                        "id": 206
                    },
                    {
                        "id": 176
                    },
                    {
                        "id": 1
                    },
                    {
                        "id": 86
                    },
                    {
                        "id": 120
                    },
                    {
                        "id": 18
                    },
                    {
                        "id": 52
                    },
                    {
                        "id": 239
                    },
                    {
                        "id": 205
                    },
                    {
                        "id": 171
                    },
                    {
                        "id": 85
                    },
                    {
                        "id": 115
                    },
                    {
                        "id": 25
                    },
                    {
                        "id": 55
                    },
                    {
                        "id": 238
                    },
                    {
                        "id": 204
                    },
                    {
                        "id": 170
                    },
                    {
                        "id": 84
                    },
                    {
                        "id": 114
                    },
                    {
                        "id": 24
                    },
                    {
                        "id": 54
                    },
                    {
                        "id": 241
                    },
                    {
                        "id": 203
                    },
                    {
                        "id": 173
                    },
                    {
                        "id": 83
                    },
                    {
                        "id": 117
                    },
                    {
                        "id": 23
                    },
                    {
                        "id": 57
                    },
                    {
                        "id": 240
                    },
                    {
                        "id": 202
                    },
                    {
                        "id": 172
                    },
                    {
                        "id": 82
                    },
                    {
                        "id": 116
                    },
                    {
                        "id": 22
                    },
                    {
                        "id": 56
                    },
                    {
                        "id": 227
                    },
                    {
                        "id": 201
                    },
                    {
                        "id": 167
                    },
                    {
                        "id": 97
                    },
                    {
                        "id": 127
                    },
                    {
                        "id": 29
                    },
                    {
                        "id": 59
                    },
                    {
                        "id": 226
                    },
                    {
                        "id": 200
                    },
                    {
                        "id": 166
                    },
                    {
                        "id": 96
                    },
                    {
                        "id": 126
                    },
                    {
                        "id": 28
                    },
                    {
                        "id": 58
                    },
                    {
                        "id": 229
                    },
                    {
                        "id": 199
                    },
                    {
                        "id": 169
                    },
                    {
                        "id": 95
                    },
                    {
                        "id": 129
                    },
                    {
                        "id": 27
                    },
                    {
                        "id": 61
                    },
                    {
                        "id": 228
                    },
                    {
                        "id": 198
                    },
                    {
                        "id": 168
                    },
                    {
                        "id": 94
                    },
                    {
                        "id": 128
                    },
                    {
                        "id": 26
                    },
                    {
                        "id": 60
                    },
                    {
                        "id": 231
                    },
                    {
                        "id": 197
                    },
                    {
                        "id": 163
                    },
                    {
                        "id": 93
                    },
                    {
                        "id": 123
                    },
                    {
                        "id": 33
                    },
                    {
                        "id": 63
                    },
                    {
                        "id": 230
                    },
                    {
                        "id": 196
                    },
                    {
                        "id": 162
                    },
                    {
                        "id": 92
                    },
                    {
                        "id": 122
                    },
                    {
                        "id": 32
                    },
                    {
                        "id": 62
                    },
                    {
                        "id": 233
                    },
                    {
                        "id": 195
                    },
                    {
                        "id": 165
                    },
                    {
                        "id": 91
                    },
                    {
                        "id": 125
                    },
                    {
                        "id": 31
                    },
                    {
                        "id": 65
                    },
                    {
                        "id": 232
                    },
                    {
                        "id": 194
                    },
                    {
                        "id": 164
                    },
                    {
                        "id": 90
                    },
                    {
                        "id": 124
                    },
                    {
                        "id": 30
                    },
                    {
                        "id": 64
                    },
                    {
                        "id": 252
                    },
                    {
                        "id": 218
                    },
                    {
                        "id": 192
                    },
                    {
                        "id": 158
                    },
                    {
                        "id": 15
                    },
                    {
                        "id": 104
                    },
                    {
                        "id": 134
                    },
                    {
                        "id": 36
                    },
                    {
                        "id": 66
                    },
                    {
                        "id": 253
                    },
                    {
                        "id": 219
                    },
                    {
                        "id": 193
                    },
                    {
                        "id": 159
                    },
                    {
                        "id": 14
                    },
                    {
                        "id": 105
                    },
                    {
                        "id": 135
                    },
                    {
                        "id": 37
                    },
                    {
                        "id": 67
                    },
                    {
                        "id": 250
                    },
                    {
                        "id": 220
                    },
                    {
                        "id": 190
                    },
                    {
                        "id": 160
                    },
                    {
                        "id": 17
                    },
                    {
                        "id": 102
                    },
                    {
                        "id": 136
                    },
                    {
                        "id": 34
                    },
                    {
                        "id": 68
                    },
                    {
                        "id": 251
                    },
                    {
                        "id": 221
                    },
                    {
                        "id": 191
                    },
                    {
                        "id": 161
                    },
                    {
                        "id": 16
                    },
                    {
                        "id": 103
                    },
                    {
                        "id": 137
                    },
                    {
                        "id": 35
                    },
                    {
                        "id": 69
                    },
                    {
                        "id": 222
                    },
                    {
                        "id": 188
                    },
                    {
                        "id": 154
                    },
                    {
                        "id": 11
                    },
                    {
                        "id": 100
                    },
                    {
                        "id": 130
                    },
                    {
                        "id": 40
                    },
                    {
                        "id": 70
                    },
                    {
                        "id": 223
                    },
                    {
                        "id": 189
                    },
                    {
                        "id": 155
                    },
                    {
                        "id": 10
                    },
                    {
                        "id": 101
                    },
                    {
                        "id": 131
                    },
                    {
                        "id": 41
                    },
                    {
                        "id": 71
                    },
                    {
                        "id": 224
                    },
                    {
                        "id": 186
                    },
                    {
                        "id": 156
                    },
                    {
                        "id": 13
                    },
                    {
                        "id": 98
                    },
                    {
                        "id": 132
                    },
                    {
                        "id": 38
                    },
                    {
                        "id": 72
                    },
                    {
                        "id": 225
                    },
                    {
                        "id": 187
                    },
                    {
                        "id": 157
                    },
                    {
                        "id": 12
                    },
                    {
                        "id": 99
                    },
                    {
                        "id": 133
                    },
                    {
                        "id": 39
                    },
                    {
                        "id": 73
                    },
                    {
                        "id": 244
                    },
                    {
                        "id": 210
                    },
                    {
                        "id": 184
                    },
                    {
                        "id": 150
                    },
                    {
                        "id": 7
                    },
                    {
                        "id": 112
                    },
                    {
                        "id": 142
                    },
                    {
                        "id": 44
                    },
                    {
                        "id": 74
                    },
                    {
                        "id": 245
                    },
                    {
                        "id": 211
                    },
                    {
                        "id": 185
                    },
                    {
                        "id": 151
                    },
                    {
                        "id": 6
                    },
                    {
                        "id": 113
                    },
                    {
                        "id": 143
                    },
                    {
                        "id": 45
                    },
                    {
                        "id": 75
                    },
                    {
                        "id": 242
                    },
                    {
                        "id": 212
                    },
                    {
                        "id": 182
                    },
                    {
                        "id": 152
                    },
                    {
                        "id": 9
                    },
                    {
                        "id": 110
                    },
                    {
                        "id": 144
                    },
                    {
                        "id": 42
                    },
                    {
                        "id": 76
                    },
                    {
                        "id": 243
                    },
                    {
                        "id": 213
                    },
                    {
                        "id": 183
                    },
                    {
                        "id": 153
                    },
                    {
                        "id": 8
                    },
                    {
                        "id": 111
                    },
                    {
                        "id": 145
                    },
                    {
                        "id": 43
                    },
                    {
                        "id": 77
                    },
                    {
                        "id": 248
                    },
                    {
                        "id": 214
                    },
                    {
                        "id": 180
                    },
                    {
                        "id": 146
                    },
                    {
                        "id": 3
                    },
                    {
                        "id": 108
                    },
                    {
                        "id": 138
                    },
                    {
                        "id": 48
                    },
                    {
                        "id": 78
                    },
                    {
                        "id": 249
                    },
                    {
                        "id": 215
                    },
                    {
                        "id": 181
                    },
                    {
                        "id": 147
                    },
                    {
                        "id": 2
                    },
                    {
                        "id": 109
                    },
                    {
                        "id": 139
                    },
                    {
                        "id": 49
                    },
                    {
                        "id": 79
                    },
                    {
                        "id": 246
                    },
                    {
                        "id": 216
                    },
                    {
                        "id": 178
                    },
                    {
                        "id": 148
                    },
                    {
                        "id": 5
                    },
                    {
                        "id": 106
                    },
                    {
                        "id": 140
                    },
                    {
                        "id": 46
                    },
                    {
                        "id": 80
                    },
                    {
                        "id": 247
                    },
                    {
                        "id": 217
                    },
                    {
                        "id": 179
                    },
                    {
                        "id": 149
                    },
                    {
                        "id": 4
                    },
                    {
                        "id": 107
                    },
                    {
                        "id": 141
                    },
                    {
                        "id": 47
                    },
                    {
                        "id": 81
                    }
                ],
                "flow-node-inventory:hardware": "Open vSwitch",
                "flow-node-inventory:description": "None",
                "flow-node-inventory:software": "2.8.1",
                "flow-node-inventory:switch-features": {
                    "max_buffers": 0,
                    "max_tables": 254,
                    "capabilities": [
                        "flow-node-inventory:flow-feature-capability-flow-stats",
                        "flow-node-inventory:flow-feature-capability-queue-stats",
                        "flow-node-inventory:flow-feature-capability-port-stats",
                        "flow-node-inventory:flow-feature-capability-table-stats",
                        "flow-node-inventory:flow-feature-capability-group-stats"
                    ]
                },
                "flow-node-inventory:manufacturer": "Nicira, Inc.",
                "flow-node-inventory:ip-address": "10.18.130.33",
                "node-connector": [
                    {
                        "id": "openflow:1:1",
                        "flow-node-inventory:supported": "",
                        "flow-node-inventory:peer-features": "",
                        "flow-node-inventory:port-number": 1,
                        "flow-node-inventory:advertised-features": "",
                        "flow-node-inventory:hardware-address": "b2:ca:1c:d5:c5:19",
                        "flow-node-inventory:current-feature": "ten-gb-fd copper",
                        "flow-node-inventory:current-speed": 10000000,
                        "flow-node-inventory:configuration": "",
                        "flow-node-inventory:maximum-speed": 0,
                        "flow-node-inventory:name": "s1-eth1",
                        "flow-node-inventory:state": {
                            "blocked": false,
                            "link-down": false,
                            "live": true
                        },
                        "opendaylight-port-statistics:flow-capable-node-connector-statistics": {
                            "receive-frame-error": 0,
                            "packets": {
                                "received": 8,
                                "transmitted": 543
                            },
                            "collision-count": 0,
                            "transmit-errors": 0,
                            "bytes": {
                                "received": 648,
                                "transmitted": 46123
                            },
                            "duration": {
                                "nanosecond": 939000000,
                                "second": 2813
                            },
                            "receive-crc-error": 0,
                            "receive-drops": 0,
                            "receive-errors": 0,
                            "receive-over-run-error": 0,
                            "transmit-drops": 0
                        }
                    },
                    {
                        "id": "openflow:1:LOCAL",
                        "flow-node-inventory:supported": "",
                        "flow-node-inventory:peer-features": "",
                        "flow-node-inventory:port-number": 4294967294,
                        "flow-node-inventory:advertised-features": "",
                        "flow-node-inventory:hardware-address": "4a:6c:f5:d9:e7:4e",
                        "flow-node-inventory:current-feature": "",
                        "flow-node-inventory:current-speed": 0,
                        "flow-node-inventory:configuration": "PORT-DOWN",
                        "flow-node-inventory:maximum-speed": 0,
                        "flow-node-inventory:name": "s1",
                        "flow-node-inventory:state": {
                            "blocked": false,
                            "link-down": true,
                            "live": false
                        },
                        "opendaylight-port-statistics:flow-capable-node-connector-statistics": {
                            "receive-frame-error": 0,
                            "packets": {
                                "received": 0,
                                "transmitted": 0
                            },
                            "collision-count": 0,
                            "transmit-errors": 0,
                            "bytes": {
                                "received": 0,
                                "transmitted": 0
                            },
                            "duration": {
                                "nanosecond": 936000000,
                                "second": 2813
                            },
                            "receive-crc-error": 0,
                            "receive-drops": 0,
                            "receive-errors": 0,
                            "receive-over-run-error": 0,
                            "transmit-drops": 0
                        }
                    }
                ]
            }
        ]
    }
}


 Comments   
Comment by Luis Gomez [ 18/Jul/19 ]

BTW the issue is also in master (sodium) if that helps with the troubleshooting.

Comment by Somashekhar Javalagi [ 22/Jul/19 ]

Hi ecelgp

I tried several times both on sodium as well as neon, but not able to reproduce the issue. which all features you installed for the controllers?

Comment by Somashekhar Javalagi [ 30/Jul/19 ]

Hi ecelgp

Is there any particular combination of configurations like number of vms o be present in the compute to achieve this?

Comment by Luis Gomez [ 15/Aug/19 ]

Neon jobs do not show the problem anymore but Sodium do:

https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-flow-services-all-sodium

I will double-check tomorrow and come back with the results.

Comment by Luis Gomez [ 16/Aug/19 ]

This seems hard to reproduce outside ODL CI. I will lower priority for now until I find more consistent way to reproduce.

Comment by Luis Gomez [ 09/Sep/19 ]

What is worrying is that this issue shows sporadically in sodium and not in other branches now:

https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-sanity-only-sodium/

I setup a test run every 30 min in sandbox:

https://jenkins.opendaylight.org/sandbox/job/openflowplugin-csit-1node-sanity-only-sodium

Would it be possible to get a DEBUG for this so I can add it to the sandbox?

Comment by Anil Vishnoi [ 09/Sep/19 ]

ecelgp you can enable the following logs to see the connection activity and stats collection activity. That should give us further clue on what's happening with the table stats collection.

 

org.opendaylight.openflowplugin.impl.connection

org.opendaylight.openflowplugin.impl.statistics

 

Comment by Jamo Luhrsen [ 09/Sep/19 ]

Can you add a non-sodium job to sandbox as well to run every 30m just to completely prove it's only a regression in Sodium. that would come in to play if we need to decide to release with this problem or not.

Comment by Luis Gomez [ 10/Sep/19 ]

OK, after analyzing the traces + OVS log (running locally), I think this could be reason or at least adding to the issue:

https://git.opendaylight.org/gerrit/#/c/integration/test/+/84284

Comment by Luis Gomez [ 10/Sep/19 ]

On the other hand longevity test for neon & sodium fail with this bug symptom:

https://jenkins.opendaylight.org/sandbox/job/openflowplugin-csit-1node-longevity-only-sodium/1/

Looking at the DEBUG log, nothing really pops up.

Comment by Jamo Luhrsen [ 10/Sep/19 ]

the ovs crash does seem worrisome, but that would not explain why this happens in sodium but not other
branches, right?

Comment by Luis Gomez [ 10/Sep/19 ]

I think the issue under investigation is happening in both neon and sodium (the later more often). The OVS crash somehow helped with the problem as after removing the test cases producing the OVS crash I do not see the issue in the sanity check:

https://jenkins.opendaylight.org/sandbox/job/openflowplugin-csit-1node-sanity-only-sodium/

Now I am more investigating the longevity test because there is no OVS crash there and still the issue shows up time to time.

Comment by Somashekhar Javalagi [ 11/Sep/19 ]

ecelgp

I did testing in my local setup where I added some delay after receiving flow table stats before writing it to the datastore. With that I am able to get operational datastore entry without flow stats. So both writing node information and writing flow stats to datastore are happening in two different transactions. Before we didn't see this issue, because after https://git.opendaylight.org/gerrit/#/c/openflowplugin/+/80434/ change onwards, we are writing node information to datastore only after mastership is acquired. So with this change both transactions are delayed for sometime after switch connection. As both transactions can go one after the other with very little delay, there is less chance of this issue getting hit.

We have to try using some mechanism to submit transaction for the first time only after flow table stats is collected. 

Comment by Luis Gomez [ 11/Sep/19 ]

OK you are saying that now there are 2 separate transactions to write node and table stats info in oper inventory and these 2 transactions can be delayed until the mastership role is resolved. Apart from that, do you see any other issue/scenario where master is resolved and the table stats are not written for long time (>10 secs), because this is what happens in this test:

https://jenkins.opendaylight.org/sandbox/job/openflowplugin-csit-1node-longevity-only-sodium/1/

The test waits from 3:48:10 to 3:48:20 (see robot log) and during that time openflow:14 receives not just one but multiple OFPMPTABLE, however when polling oper inventory, table stats shows empty:

2019-09-10T03:48:07,743 | DEBUG | epollEventLoopGroup-9-6 | HandshakeListenerImpl            | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | succeeded by getting sweep barrier after post-handshake for device openflow:14
2019-09-10T03:48:07,743 | INFO  | epollEventLoopGroup-9-6 | ContextChainHolderImpl           | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Device openflow:14 connected.
2019-09-10T03:48:07,744 | INFO  | epollEventLoopGroup-9-6 | ContextChainHolderImpl           | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | No context chain found for device: openflow:14, creating new.
2019-09-10T03:48:07,744 | INFO  | epollEventLoopGroup-9-6 | DeviceManagerImpl                | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | ConnectionEvent: Device connected to controller, Device:/10.30.170.172:40536, NodeId:Uri{_value=openflow:14}
2019-09-10T03:48:07,786 | INFO  | epollEventLoopGroup-9-6 | RoleContextImpl                  | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Started timer for setting SLAVE role on device openflow:14 if no role will be set in 20s.
2019-09-10T03:48:07,837 | INFO  | opendaylight-cluster-data-akka.actor.default-dispatcher-38 | DeviceOwnershipServiceImpl       | 313 - org.opendaylight.openflowplugin.applications.device-ownership-service - 0.9.0.SNAPSHOT | Entity ownership change received for node : openflow:14 : EntityOwnershipChange [entity=Entity [type=org.opendaylight.mdsal.ServiceEntityType, id=KeyedInstanceIdentifier{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.mdsal.core.general.entity.rev150930.Entity, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.mdsal.core.general.entity.rev150930.Entity[key=EntityKey{_name=openflow:14}]]}], state=LOCAL_OWNERSHIP_GRANTED [wasOwner=false, isOwner=true, hasOwner=true], inJeopardy=false]
2019-09-10T03:48:08,104 | INFO  | opendaylight-cluster-data-akka.actor.default-dispatcher-44 | GuardedContextImpl               | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Starting DeviceContextImpl[NEW] service for node openflow:14
2019-09-10T03:48:08,104 | INFO  | opendaylight-cluster-data-akka.actor.default-dispatcher-44 | GuardedContextImpl               | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Starting RpcContextImpl[NEW] service for node openflow:14
2019-09-10T03:48:08,147 | INFO  | opendaylight-cluster-data-akka.actor.default-dispatcher-44 | GuardedContextImpl               | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Starting StatisticsContextImpl[NEW] service for node openflow:14
2019-09-10T03:48:08,147 | INFO  | opendaylight-cluster-data-akka.actor.default-dispatcher-44 | GuardedContextImpl               | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Starting RoleContextImpl[NEW] service for node openflow:14
2019-09-10T03:48:08,148 | INFO  | opendaylight-cluster-data-akka.actor.default-dispatcher-44 | SalRoleServiceImpl               | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | SetRole called with input:SetRoleInput{_controllerRole=BECOMEMASTER, _node=NodeRef{_value=KeyedInstanceIdentifier{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey{_id=Uri{_value=openflow:14}}]]}}, augmentation=[]}
2019-09-10T03:48:08,148 | INFO  | opendaylight-cluster-data-akka.actor.default-dispatcher-44 | SalRoleServiceImpl               | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | RoleChangeTask called on device:openflow:14 OFPRole:BECOMEMASTER
2019-09-10T03:48:08,149 | INFO  | opendaylight-cluster-data-akka.actor.default-dispatcher-44 | RoleService                      | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | getGenerationIdFromDevice called for device: openflow:14
2019-09-10T03:48:08,149 | INFO  | opendaylight-cluster-data-akka.actor.default-dispatcher-44 | ContextChainImpl                 | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Started clustering services for node openflow:14
2019-09-10T03:48:08,152 | INFO  | epollEventLoopGroup-9-6 | RoleService                      | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | submitRoleChange called for device:Uri{_value=openflow:14}, role:BECOMEMASTER
2019-09-10T03:48:08,167 | INFO  | epollEventLoopGroup-9-6 | RoleService                      | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | submitRoleChange onSuccess for device:Uri{_value=openflow:14}, role:BECOMEMASTER
2019-09-10T03:48:08,168 | INFO  | ofppool-2        | FlowNodeReconciliationImpl       | 314 - org.opendaylight.openflowplugin.applications.forwardingrules-manager - 0.9.0.SNAPSHOT | Triggering reconciliation for device NodeKey{_id=Uri{_value=openflow:14}}
2019-09-10T03:48:08,211 | INFO  | ListenableFutureAdapter-thread-6 | ContextChainHolderImpl           | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Device openflow:14 connection is enabled by reconciliation framework.
2019-09-10T03:48:08,217 | INFO  | epollEventLoopGroup-9-6 | DeviceInitializationUtil         | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | IP address of the node openflow:14 is: IpAddress{_ipv4Address=Ipv4Address{_value=10.30.170.172}}
2019-09-10T03:48:08,217 | INFO  | epollEventLoopGroup-9-6 | DeviceInitializationUtil         | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Port number of the node openflow:14 is: 40536
2019-09-10T03:48:08,740 | INFO  | epollEventLoopGroup-9-6 | OF13DeviceInitializer            | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Static node openflow:14 info: OFPMPMETERFEATURES collected
2019-09-10T03:48:08,771 | INFO  | epollEventLoopGroup-9-6 | OF13DeviceInitializer            | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Static node openflow:14 info: OFPMPGROUPFEATURES collected
2019-09-10T03:48:08,829 | INFO  | epollEventLoopGroup-9-6 | OF13DeviceInitializer            | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Static node openflow:14 info: OFPMPPORTDESC collected
2019-09-10T03:48:08,842 | INFO  | epollEventLoopGroup-9-6 | OF13DeviceInitializer            | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Static node openflow:14 successfully finished collecting
2019-09-10T03:48:08,991 | DEBUG | ListenableFutureAdapter-thread-6 | StatisticsContextImpl            | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Status of previous stat iteration for node openflow:14: true
2019-09-10T03:48:08,992 | DEBUG | ListenableFutureAdapter-thread-6 | StatisticsContextImpl            | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Stats iterating to next type for node openflow:14 of type OFPMPTABLE
2019-09-10T03:48:08,992 | DEBUG | ListenableFutureAdapter-thread-6 | StatisticsGatheringService       | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Getting statistics for node Uri{_value=openflow:14} of type OFPMPTABLE
2019-09-10T03:48:09,177 | DEBUG | ofppool-8        | StatisticsGatheringUtils         | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Stats reply successfully received for node Uri{_value=openflow:14} of type OFPMPTABLE
2019-09-10T03:48:09,288 | INFO  | ListenableFutureAdapter-thread-6 | ContextChainImpl                 | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Device openflow:14 is able to work as master
2019-09-10T03:48:09,289 | INFO  | ListenableFutureAdapter-thread-6 | ContextChainHolderImpl           | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Role MASTER was granted to device openflow:14
2019-09-10T03:48:09,373 | INFO  | ListenableFutureAdapter-thread-6 | DeviceManagerImpl                | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Publishing node added notification for Uri{_value=openflow:14}
2019-09-10T03:48:09,373 | INFO  | ListenableFutureAdapter-thread-6 | StatisticsContextImpl            | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Starting statistics gathering for node openflow:14
2019-09-10T03:48:09,404 | DEBUG | ofppool-8        | StatisticsGatheringUtils         | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Stats reply added to transaction for node Uri{_value=openflow:14} of type OFPMPTABLE
...
2019-09-10T03:48:12,505 | DEBUG | StatisticsPollingService RUNNING | StatisticsContextImpl            | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Stats iterating to next type for node openflow:14 of type OFPMPTABLE
2019-09-10T03:48:12,506 | DEBUG | StatisticsPollingService RUNNING | StatisticsGatheringService       | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Getting statistics for node Uri{_value=openflow:14} of type OFPMPTABLE
2019-09-10T03:48:12,552 | DEBUG | ofppool-12       | StatisticsGatheringUtils         | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Stats reply successfully received for node Uri{_value=openflow:14} of type OFPMPTABLE
2019-09-10T03:48:12,659 | DEBUG | ofppool-12       | StatisticsGatheringUtils         | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Stats reply added to transaction for node Uri{_value=openflow:14} of type OFPMPTABLE
...
2019-09-10T03:48:15,754 | DEBUG | StatisticsPollingService RUNNING | StatisticsContextImpl            | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Stats iterating to next type for node openflow:14 of type OFPMPTABLE
2019-09-10T03:48:15,754 | DEBUG | StatisticsPollingService RUNNING | StatisticsGatheringService       | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Getting statistics for node Uri{_value=openflow:14} of type OFPMPTABLE
2019-09-10T03:48:15,757 | DEBUG | ofppool-12       | StatisticsGatheringUtils         | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Stats reply successfully received for node Uri{_value=openflow:14} of type OFPMPTABLE
2019-09-10T03:48:15,764 | DEBUG | ofppool-12       | StatisticsGatheringUtils         | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Stats reply added to transaction for node Uri{_value=openflow:14} of type OFPMPTABLE
Comment by Jamo Luhrsen [ 12/Sep/19 ]

Somashekhar and ecelgp, this is the last bug we are waiting to figure out before we can release the now late Sodium release

Comment by Luis Gomez [ 12/Sep/19 ]

In general the issue does not seem easy to reproduce, unless you have an abnormal event like an OVS crash or you run multiple switch disconnect+reconnect for long time (longevity test), so maybe we can release sodium and work more on this after. This is my opinion but we can also hear Somashekhar and Avishnoi opinion too.

Comment by Somashekhar Javalagi [ 13/Sep/19 ]

Ideally after this,
2019-09-10T03:48:09,404 | DEBUG | ofppool-8 | StatisticsGatheringUtils | 325 - org.opendaylight.openflowplugin.impl - 0.9.0.SNAPSHOT | Stats reply added to transaction for node Uri{_value=openflow:14} of type OFPMPTABLE
the stats should be committed  and should be available in datastore. May be this transaction processing took more time.

I have raised a patch to collect stats during device initialization where node info and flow-table-stats will be committed in a single transaction.

Comment by Anil Vishnoi [ 13/Sep/19 ]

ecelgp In the longevity test, do table stats never shows up in the operational data store? If it shows up, but with some delay (2-3 stats cycle), than in my opinion it's not a blocker situation, but if stats never shows up even after waiting for 5 minutes or more with stable connection, then it must be an issue with stats commit to datastore. This can be bit critical, because i assume in that scenario, none of the other statistics (port, flow,group) should show up in the datastore. did you see that only table stats are missing but rest all stats are there in the data store (atleast port stats)?

If it's transaction level issue where none of stats shows up in the datastore is a critical issue in my opinion. The only resonable workaround here is to disconnect the device and connect back, and after that workaround if stats collection still doesn't work, i would like to keep this bug as a blocker bug, but if it recovers, i would put it as a non-blocker and release note the issue with the workaround.

Somashekhar The patch that you pushed can have minor impact on the scale, as the traffic between switch-controller will increase at the time of onbaording. Also delay in table stats is not really a critical issue (obviously if its not because of non-performance logic) because these are statistics counters only. 

Comment by Luis Gomez [ 14/Sep/19 ]

OK let me do couple of things:

  • Increase the test timeout so we know if table stats are published at all.
  • Increase the test detail level so we know exactly what is missing.

I will come back when I test with the above.

Comment by Luis Gomez [ 15/Sep/19 ]

It seems like in a normal scenario (no OVS crash) the table stats are reported correctly after increasing the wait time from 10 secs to 30 secs. This means we can reduce the criticality of this bug and release sodium:

https://jenkins.opendaylight.org/sandbox/job/openflowplugin-csit-1node-longevity-only-sodium/

Comment by Luis Gomez [ 18/Nov/19 ]

I think Magnesium brought the issue back:

https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-flow-services-all-magnesium/

We have to investigate what is going on here.

Comment by Luis Gomez [ 20/Nov/19 ]

So here are the DEBUG logs for the failure:
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/builder-copy-sandbox-logs/694/openflowplugin-csit-1node-sanity-only-magnesium/25/

According to test report below:
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/builder-copy-sandbox-logs/694/openflowplugin-csit-1node-sanity-only-magnesium/25/robot-plugin/log.html.gz#s1-s3-s1

The test is waiting between 17:36:55.816 and 17:37:05.067 for table stats. However these stats are not present in oper datastore.

Looking at the karaf.log at the same time:
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/builder-copy-sandbox-logs/694/openflowplugin-csit-1node-sanity-only-magnesium/25/odl_1/odl1_karaf.log.gz

It seems like multiple table stats are received while the test is waiting:

2019-11-20T01:36:55,726 | DEBUG | ListenableFutureAdapter-thread-0 | StatisticsContextImpl            | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Status of previous stat iteration for node openflow:1: true
2019-11-20T01:36:55,727 | DEBUG | ListenableFutureAdapter-thread-0 | StatisticsContextImpl            | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Stats iterating to next type for node openflow:1 of type OFPMPTABLE
2019-11-20T01:36:55,727 | DEBUG | ListenableFutureAdapter-thread-0 | StatisticsGatheringService       | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Getting statistics for node Uri{_value=openflow:1} of type OFPMPTABLE
2019-11-20T01:36:55,730 | DEBUG | ofppool-1        | StatisticsGatheringUtils         | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Stats reply successfully received for node Uri{_value=openflow:1} of type OFPMPTABLE
2019-11-20T01:36:55,731 | DEBUG | ofppool-1        | AbstractMultipartWriter          | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Writing multipart data of type interface org.opendaylight.yang.gen.v1.urn.opendaylight.flow.table.statistics.rev131215.FlowTableAndStatisticsMap for node KeyedInstanceIdentifier{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey{_id=Uri{_value=openflow:1}}]]}
2019-11-20T01:36:55,735 | DEBUG | ListenableFutureAdapter-thread-0 | ContextChainImpl                 | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Device openflow:1, initial submit OK.
2019-11-20T01:36:55,735 | INFO  | ListenableFutureAdapter-thread-0 | ContextChainImpl                 | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Device openflow:1 is able to work as master
2019-11-20T01:36:55,735 | INFO  | ListenableFutureAdapter-thread-0 | ContextChainHolderImpl           | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Role MASTER was granted to device openflow:1
2019-11-20T01:36:55,735 | INFO  | ListenableFutureAdapter-thread-0 | DeviceManagerImpl                | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Publishing node added notification for Uri{_value=openflow:1}
2019-11-20T01:36:55,736 | INFO  | ListenableFutureAdapter-thread-0 | StatisticsContextImpl            | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Starting statistics gathering for node openflow:1
2019-11-20T01:36:55,738 | INFO  | opendaylight-cluster-data-notification-dispatcher-58 | ConnectionManagerImpl            | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Clearing the device connection timer for the device 1
2019-11-20T01:36:55,753 | DEBUG | ofppool-1        | StatisticsGatheringUtils         | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Stats reply added to transaction for node Uri{_value=openflow:1} of type OFPMPTABLE


2019-11-20T01:36:58,739 | DEBUG | StatisticsPollingService RUNNING | StatisticsContextImpl            | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Status of previous stat iteration for node openflow:1: true
2019-11-20T01:36:58,739 | DEBUG | StatisticsPollingService RUNNING | StatisticsContextImpl            | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Stats iterating to next type for node openflow:1 of type OFPMPTABLE
2019-11-20T01:36:58,739 | DEBUG | StatisticsPollingService RUNNING | StatisticsGatheringService       | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Getting statistics for node Uri{_value=openflow:1} of type OFPMPTABLE
2019-11-20T01:36:58,742 | DEBUG | ofppool-1        | StatisticsGatheringUtils         | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Stats reply successfully received for node Uri{_value=openflow:1} of type OFPMPTABLE
2019-11-20T01:36:58,742 | DEBUG | ofppool-1        | AbstractMultipartWriter          | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Writing multipart data of type interface org.opendaylight.yang.gen.v1.urn.opendaylight.flow.table.statistics.rev131215.FlowTableAndStatisticsMap for node KeyedInstanceIdentifier{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey{_id=Uri{_value=openflow:1}}]]}
2019-11-20T01:36:58,748 | DEBUG | ofppool-1        | StatisticsGatheringUtils         | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Stats reply added to transaction for node Uri{_value=openflow:1} of type OFPMPTABLE
20

2019-11-20T01:37:01,774 | DEBUG | StatisticsPollingService RUNNING | StatisticsContextImpl            | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Status of previous stat iteration for node openflow:1: true
2019-11-20T01:37:01,775 | DEBUG | StatisticsPollingService RUNNING | StatisticsContextImpl            | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Stats iterating to next type for node openflow:1 of type OFPMPTABLE
2019-11-20T01:37:01,775 | DEBUG | StatisticsPollingService RUNNING | StatisticsGatheringService       | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Getting statistics for node Uri{_value=openflow:1} of type OFPMPTABLE
2019-11-20T01:37:01,779 | DEBUG | ofppool-1        | StatisticsGatheringUtils         | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Stats reply successfully received for node Uri{_value=openflow:1} of type OFPMPTABLE
2019-11-20T01:37:01,779 | DEBUG | ofppool-1        | AbstractMultipartWriter          | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Writing multipart data of type interface org.opendaylight.yang.gen.v1.urn.opendaylight.flow.table.statistics.rev131215.FlowTableAndStatisticsMap for node KeyedInstanceIdentifier{targetType=interface org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node, path=[org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.Nodes, org.opendaylight.yang.gen.v1.urn.opendaylight.inventory.rev130819.nodes.Node[key=NodeKey{_id=Uri{_value=openflow:1}}]]}
2019-11-20T01:37:01,793 | DEBUG | ofppool-1        | StatisticsGatheringUtils         | 317 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Stats reply added to transaction for node Uri{_value=openflow:1} of type OFPMPTABLE

Comment by Somashekhar Javalagi [ 25/Nov/19 ]

with the debug logs added for gerrit(https://git.opendaylight.org/gerrit/#/c/openflowplugin/+/82676/), I triggered csit and I am seeing that table stats(OFPMP_TABLE) is added to transaction and same is committed to operational inventory. I had added logs to confirm when the transaction commit is successful and completed. Though I see as txn commit is complete on same second(2019-11-25T07:28:44), not understanding why test stats are not available in datastore for test script.

 

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/openflowplugin-csit-1node-gate-flow-services-all-magnesium/33/robot-plugin/log.html#s1-s2-s3-t4

table stats check time: 20191125 12:59:05.804 / 20191125 12:59:05.806 / 00:00:00.002

 

2019-11-25T07:28:44,686 | INFO  | epollEventLoopGroup-18-1 | ContextChainHolderImpl           | 436 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Device openflow:3 connected.

 

2019-11-25T07:28:44,760 | INFO  | ListenableFutureAdapter-thread-1 | TransactionChainManager          | 432 - org.opendaylight.openflowplugin.common - 0.10.0.SNAPSHOT | enabled submit transaction for device: openflow:3, submitIsEnabled: true

 

2019-11-25T07:28:44,765 | INFO  | ofppool-4        | StatisticsGatheringUtils         | 436 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Stats reply successfully received for node Uri{_value=openflow:3} of type OFPMPTABLE

 

2019-11-25T07:28:44,770 | INFO  | ListenableFutureAdapter-thread-1 | StatisticsContextImpl            | 436 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Starting statistics gathering for node openflow:3

 

2019-11-25T07:28:44,803 | INFO  | ofppool-4        | StatisticsGatheringUtils         | 436 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Stats reply added to transaction for node Uri{_value=openflow:3} of type OFPMPTABLE

 

2019-11-25T07:28:44,803 | INFO  | ofppool-4        | TransactionChainManager          | 432 - org.opendaylight.openflowplugin.common - 0.10.0.SNAPSHOT | Transaction for device openflow:3 is submitted to the datastore. isInitialiCommit: false, txn: org.opendaylight.mdsal.binding.dom.adapter.BindingDOMTransactionChainAdapter$2@5ada48c5

 

2019-11-25T07:28:44,830 | INFO  | CommitFutures-0  | TransactionChainManager          | 432 - org.opendaylight.openflowplugin.common - 0.10.0.SNAPSHOT | Transaction commit successful. txn: org.opendaylight.mdsal.binding.dom.adapter.BindingDOMTransactionChainAdapter$2@5ada48c5

 

Comment by Luis Gomez [ 27/Nov/19 ]

Yep, nothing in the log indicates a problem but you can see in the robot log, the table stats are missing. Is it possible to dump the content of OFPMTABLE message and/or what is being written to oper DS?

Comment by Luis Gomez [ 17/Dec/19 ]

OK I enabled DS write dump in this test:

https://jenkins.opendaylight.org/releng/job/openflowplugin-csit-1node-gate-flow-services-all-magnesium/40/

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/openflowplugin-csit-1node-gate-flow-services-all-magnesium/40/robot-plugin/log.html.gz#s1-s3

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/openflowplugin-csit-1node-gate-flow-services-all-magnesium/40/odl_1/odl1_karaf.log.gz

You can see in the karaf log above that stats are initially written:

2019-12-17T05:47:14,067 | INFO  | StatisticsPollingService RUNNING | StatisticsContextImpl            | 439 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Status of previous stat iteration for node openflow:1: true
2019-12-17T05:47:14,067 | INFO  | StatisticsPollingService RUNNING | StatisticsContextImpl            | 439 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Stats iterating to next type for node openflow:1 of type OFPMPTABLE
2019-12-17T05:47:14,067 | INFO  | StatisticsPollingService RUNNING | StatisticsGatheringService       | 439 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Getting statistics for node Uri{_value=openflow:1} of type OFPMPTABLE
2019-12-17T05:47:14,068 | INFO  | CommitFutures-2  | TransactionChainManager          | 435 - org.opendaylight.openflowplugin.common - 0.10.0.SNAPSHOT | Transaction commit successful. txn: org.opendaylight.mdsal.binding.dom.adapter.BindingDOMTransactionChainAdapter$2@6d758ffa
2019-12-17T05:47:14,070 | INFO  | ofppool-1        | StatisticsGatheringUtils         | 439 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Stats reply successfully received for node Uri{_value=openflow:1} of type OFPMPTABLE

2019-12-17T05:47:14,070 | DEBUG | ofppool-1        | SnapshotBackedWriteTransaction   | 369 - org.opendaylight.mdsal.dom-spi - 5.0.6 | Write Tx: member-1-datastore-operational-fe-0-chn-19-txn-7-0 allocated with snapshot org.opendaylight.yangtools.yang.data.api.schema.tree.spi.Version@2c058e58

2019-12-17T05:47:14,070 | DEBUG | ofppool-1        | SnapshotBackedWriteTransaction   | 369 - org.opendaylight.mdsal.dom-spi - 5.0.6 | Tx: member-1-datastore-operational-fe-0-chn-19-txn-7-0 Write: /(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:1}]/AugmentationIdentifier{childNames=[(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-match-types, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions, (urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features, (urn:opendaylight:flow:inventory?revision=2013-08-19)port-number, (urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer, (urn:opendaylight:flow:inventory?revision=2013-08-19)hardware, (urn:opendaylight:flow:inventory?revision=2013-08-19)software, (urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number, (urn:opendaylight:flow:inventory?revision=2013-08-19)description, (urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address, (urn:opendaylight:flow:inventory?revision=2013-08-19)meter, (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter, (urn:opendaylight:flow:inventory?revision=2013-08-19)group, (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-group, (urn:opendaylight:flow:inventory?revision=2013-08-19)table, (urn:opendaylight:flow:inventory?revision=2013-08-19)table-features]}/(urn:opendaylight:flow:inventory?revision=2013-08-19)table/table[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=0}]/AugmentationIdentifier{childNames=[(urn:opendaylight:flow:table:statistics?revision=2013-12-15)flow-table-statistics]}/(urn:opendaylight:flow:table:statistics?revision=2013-12-15)flow-table-statistics:ImmutableContainerNode{identifier=(urn:opendaylight:flow:table:statistics?revision=2013-12-15)flow-table-statistics, value=[ImmutableLeafNode{identifier=(urn:opendaylight:flow:table:statistics?revision=2013-12-15)active-flows, value=1}, ImmutableLeafNode{identifier=(urn:opendaylight:flow:table:statistics?revision=2013-12-15)packets-looked-up, value=10}, ImmutableLeafNode{identifier=(urn:opendaylight:flow:table:statistics?revision=2013-12-15)packets-matched, value=9}]}

But then they are cleared by the flow stats writing:

2019-12-17T05:47:14,135 | DEBUG | opendaylight-cluster-data-akka.actor.default-dispatcher-27 | SnapshotBackedWriteTransaction   | 369 - org.opendaylight.mdsal.dom-spi - 5.0.6 | Write Tx: member-1-datastore-operational-fe-0-chn-19-txn-8-0 allocated with snapshot org.opendaylight.yangtools.yang.data.api.schema.tree.spi.Version@5db306a9

2019-12-17T05:47:14,136 | DEBUG | opendaylight-cluster-data-akka.actor.default-dispatcher-27 | SnapshotBackedWriteTransaction   | 369 - org.opendaylight.mdsal.dom-spi - 5.0.6 | Tx: member-1-datastore-operational-fe-0-chn-19-txn-8-0 Write: /(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:1}]/AugmentationIdentifier{childNames=[(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-match-types, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions, (urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features, (urn:opendaylight:flow:inventory?revision=2013-08-19)port-number, (urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer, (urn:opendaylight:flow:inventory?revision=2013-08-19)hardware, (urn:opendaylight:flow:inventory?revision=2013-08-19)software, (urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number, (urn:opendaylight:flow:inventory?revision=2013-08-19)description, (urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address, (urn:opendaylight:flow:inventory?revision=2013-08-19)meter, (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter, (urn:opendaylight:flow:inventory?revision=2013-08-19)group, (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-group, (urn:opendaylight:flow:inventory?revision=2013-08-19)table, (urn:opendaylight:flow:inventory?revision=2013-08-19)table-features]}/(urn:opendaylight:flow:inventory?revision=2013-08-19)table/table[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=136}]:ImmutableMapEntryNode{identifier=(urn:opendaylight:flow:inventory?revision=2013-08-19)table[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=136}], value=[ImmutableAugmentationNode{identifier=AugmentationIdentifier{childNames=[(urn:opendaylight:flow:table:statistics?revision=2013-12-15)flow-table-statistics]}, value=[]}, ImmutableMapNode{identifier=(urn:opendaylight:flow:inventory?revision=2013-08-19)flow, value=[]}, ImmutableLeafNode{identifier=(urn:opendaylight:flow:inventory?revision=2013-08-19)id, value=136}]}

However when it works the table stats are written along with the flow stats:

2019-12-17T05:46:29,230 | DEBUG | epollEventLoopGroup-16-5 | SnapshotBackedWriteTransaction   | 369 - org.opendaylight.mdsal.dom-spi - 5.0.6 | Tx: member-1-datastore-operational-fe-0-chn-14-txn-3-0 Write: /(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:1}]/AugmentationIdentifier{childNames=[(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-match-types, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions, (urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features, (urn:opendaylight:flow:inventory?revision=2013-08-19)port-number, (urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer, (urn:opendaylight:flow:inventory?revision=2013-08-19)hardware, (urn:opendaylight:flow:inventory?revision=2013-08-19)software, (urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number, (urn:opendaylight:flow:inventory?revision=2013-08-19)description, (urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address, (urn:opendaylight:flow:inventory?revision=2013-08-19)meter, (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter, (urn:opendaylight:flow:inventory?revision=2013-08-19)group, (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-group, (urn:opendaylight:flow:inventory?revision=2013-08-19)table, (urn:opendaylight:flow:inventory?revision=2013-08-19)table-features]}/(urn:opendaylight:flow:inventory?revision=2013-08-19)table/table[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=136}]:ImmutableMapEntryNode{identifier=(urn:opendaylight:flow:inventory?revision=2013-08-19)table[{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=136}], value=[ImmutableAugmentationNode{identifier=AugmentationIdentifier{childNames=[(urn:opendaylight:flow:table:statistics?revision=2013-12-15)flow-table-statistics]}, value=[ImmutableContainerNode{identifier=(urn:opendaylight:flow:table:statistics?revision=2013-12-15)flow-table-statistics, value=[ImmutableLeafNode{identifier=(urn:opendaylight:flow:table:statistics?revision=2013-12-15)active-flows, value=0}, ImmutableLeafNode{identifier=(urn:opendaylight:flow:table:statistics?revision=2013-12-15)packets-looked-up, value=0}, ImmutableLeafNode{identifier=(urn:opendaylight:flow:table:statistics?revision=2013-12-15)packets-matched, value=0}]}]}, ImmutableMapNode{identifier=(urn:opendaylight:flow:inventory?revision=2013-08-19)flow, value=[]}, ImmutableLeafNode{identifier=(urn:opendaylight:flow:inventory?revision=2013-08-19)id, value=136}]}
Comment by Somashekhar Javalagi [ 23/Jan/20 ]

Hi ecelgp,

The transaction where flow-table-statistics is empty is performed by the thread opendaylight-cluster-data-akka.actor.default-dispatcher-27. But ofppool and epollEventLoopGroup threads which are from openflowplugin are writing non-empty stats. And also only at one place in openflowplugin we are writing into the transaction for operational inventory. I printed txn data by adding logs also. I could not see any write operation with empty stats.

So we have to find out is there some other entity other than openflowplugin modifying the stats.

 

Comment by Luis Gomez [ 23/Jan/20 ]

Should we move the bug to controller project?

Comment by Somashekhar Javalagi [ 27/Jan/20 ]

I'm planning to add some logs to track, from where the call to empty stats is arriving. After that we can decide to move it.

Comment by Luis Gomez [ 27/Jan/20 ]

OK, thanks Somashekhar

Comment by Somashekhar Javalagi [ 31/Jan/20 ]

ecelgp, I added some logs explicitly in org.opendaylight.mdsal.binding.api.WriteOperations class, which is the only api we are using to write data into operational inventory. I don't see any empty stats transaction is being received by the api. 

And also I am not seeing any transaction from akka-cluster thread(opendaylight-cluster-data-akka.actor.default-dispatcher) when i tried flapping switch connection multiple times while running locally, whichever is seen in csit job that you ran.

 

2019-12-17T05:47:14,136 | DEBUG | opendaylight-cluster-data-akka.actor.default-dispatcher-27 | SnapshotBackedWriteTransaction | 369 - org.opendaylight.mdsal.dom-spi - 5.0.6 | Tx: member-1-datastore-operational-fe-0-chn-19-txn-8-0 Write: /(urn:opendaylight:inventory?revision=2013-08-19)nodes/node/node[\{(urn:opendaylight:inventory?revision=2013-08-19)id=openflow:1}]/AugmentationIdentifier{childNames=[(urn:opendaylight:flow:inventory?revision=2013-08-19)supported-match-types, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-instructions, (urn:opendaylight:flow:inventory?revision=2013-08-19)supported-actions, (urn:opendaylight:flow:inventory?revision=2013-08-19)switch-features, (urn:opendaylight:flow:inventory?revision=2013-08-19)port-number, (urn:opendaylight:flow:inventory?revision=2013-08-19)manufacturer, (urn:opendaylight:flow:inventory?revision=2013-08-19)hardware, (urn:opendaylight:flow:inventory?revision=2013-08-19)software, (urn:opendaylight:flow:inventory?revision=2013-08-19)serial-number, (urn:opendaylight:flow:inventory?revision=2013-08-19)description, (urn:opendaylight:flow:inventory?revision=2013-08-19)ip-address, (urn:opendaylight:flow:inventory?revision=2013-08-19)meter, (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-meter, (urn:opendaylight:flow:inventory?revision=2013-08-19)group, (urn:opendaylight:flow:inventory?revision=2013-08-19)stale-group, (urn:opendaylight:flow:inventory?revision=2013-08-19)table, (urn:opendaylight:flow:inventory?revision=2013-08-19)table-features]}/(urn:opendaylight:flow:inventory?revision=2013-08-19)table/table[\{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=186}]:ImmutableMapEntryNode{identifier=(urn:opendaylight:flow:inventory?revision=2013-08-19)table[

{(urn:opendaylight:flow:inventory?revision=2013-08-19)id=186}

], value=[ImmutableAugmentationNode{identifier=AugmentationIdentifier

{childNames=[(urn:opendaylight:flow:table:statistics?revision=2013-12-15)flow-table-statistics]}

, value=[]},

Comment by Luis Gomez [ 10/Feb/20 ]

You mean you cannot reproduce the table stats issue when you run in your setup? If so, I think we can try to troubleshoot in Jenkins. Let me know which patch/DEBUG should I enable, otherwise I can use same as last time: "org.opendaylight.mdsal.dom.spi.store.SnapshotBackedWriteTransaction" .

Comment by Somashekhar Javalagi [ 11/Feb/20 ]

Yes, today only once I was able to reproduce transaction from akka-cluster thread(opendaylight-cluster-data-akka.actor.default-dispatcher) with empty stats. I have now raised patches by adding logs at openflowplugin level where there is chance of overwrite of table flow stats and controller level where there are write transactions from all the clients are received before submitting to mdsal executor service. please run the CSIT with below gerrit patches.

https://git.opendaylight.org/gerrit/#/c/controller/+/87566/

https://git.opendaylight.org/gerrit/#/c/openflowplugin/+/82676/

and with below log levels enabled

org.opendaylight.mdsal.dom.spi.store.SnapshotBackedWriteTransaction - DEBUG

Comment by Luis Gomez [ 12/Feb/20 ]

Here you are:

Jenkins:
https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-gate-flow-services-all-magnesium/49/

Logs:
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/openflowplugin-csit-1node-gate-flow-services-all-magnesium/49/odl_1/odl1_karaf.log.gz

Comment by Luis Gomez [ 13/Feb/20 ]

Hi Somashekhar and Arunprakash can you please take a look at the posted logs?

Magnesium looks pretty bad because of this bug:

https://jenkins.opendaylight.org/releng/job/openflowplugin-csit-1node-flow-services-all-magnesium/

Comment by Somashekhar Javalagi [ 13/Feb/20 ]

ecelgp

 

We looked into csit logs and we found the root cause of the issue.

After device acquires master ship role from the switch, we are triggering first StatisticsGathering and then submitting the first transaction to operational inventory which involves node, port and few stats information.

In the success case, stats collection is happening very fastly so that it will be added to the first transaction and the same will be submitted to datastore.

But in the failure case, the stats collection is delayed(because explicitly we are not making transaction to wait till first stats collection is done). So stats will be added to the next transaction. As part of stats collection, we are retrieving below details.

OFPMPTABLE

OFPMPGROUP

OFPMPMETER

OFPMPFLOW

OFPMPPORTSTATS

OFPMPQUEUE

 

As we cannot add flow stats without table stats in transaction, we are adding proper table stats(OFPMPTABLE) into the transaction. But later as part of flow stats addition, we are reading table stats from the datastore (which will be empty in datastore as stats was not part of first transaction in failure case), then we are building table by removing flow stats(OFPMPFLOW) and adding only table stats to remove earlier flow stats. Because of this, every time we are overwriting collected table stats with empty stats while adding flow stats.

 

To come over this issue, we are planning to introduce some synchronization after table stats are collected. For the same we tried to add fast fix and tried running csit. But due to javadoc build failure for master branch, we are not able to test csit.

 

Comment by Luis Gomez [ 13/Feb/20 ]

OK that makes sense. What is the gerrit you are trying to run CSIT? AFAIR the patch-test uses -Pq profile so it should not be generating javadocs.

Comment by Arunprakash D [ 13/Feb/20 ]

Its the same test patch. https://git.opendaylight.org/gerrit/#/c/openflowplugin/+/82676/

I have triggered the CSIT for it and will wait and see the results.

Comment by Arunprakash D [ 13/Feb/20 ]

ecelgp could you please run only the suit which was failing separately and let us know the results.

Comment by Somashekhar Javalagi [ 13/Feb/20 ]

With the synchronization on table stats write to datastore, we ran csit twice. We don't see any stats related test cases getting failed. But some scale test cases are failing because we are making every table stats transaction to pause the thread until transaction is completed. Due to this switch connections and oper inventory update is taking longer time, making scale test cases to fail upon number of nodes check.

We will make changes for table stats to hold thread only until transaction complete only for first time and run csit.

Comment by Somashekhar Javalagi [ 14/Feb/20 ]

ecelgp

I added few logs and changes for holding thread for first table stats only. Then I triggered csit(test-openflowplugin-core). But csit is still failing with same errors. I doubt with javadoc failures, whether latest changes from my review patch are taken or not. Because I don't see new logs that I added, in csit karaf log.

Comment by Luis Gomez [ 14/Feb/20 ]

Somashekhar and Arunprakash, the cluster test is not your issue, I think a last minute merge in master has produced some regression (see build 158 below):

https://jenkins.opendaylight.org/releng/job/openflowplugin-csit-3node-clustering-only-magnesium/

I will analyze this and revert back.

Comment by Luis Gomez [ 14/Feb/20 ]

Also are OFP merges from Feb 11 properly tested: https://git.opendaylight.org/gerrit/#/q/branch:master+status:merged+project:openflowplugin

Comment by Arunprakash D [ 14/Feb/20 ]

All the reviews, merged build has failed for those merged on Feb 12th, so the aritifacts won't be published to nexus repo.

Comment by Luis Gomez [ 14/Feb/20 ]

That is true but autorelease which is what triggers CSIT nowadays builds ODL projects from scratch, so any merged changes get tested daily anyway.

Comment by Somashekhar Javalagi [ 17/Feb/20 ]

ecelgp

We have added changes to wait for commit for first table stats write. With this csit seems to be fine. But one or two test cases are failing due to some delay in writing node information first time to operational inventory.

https://jenkins.opendaylight.org/releng/job/openflowplugin-patch-test-core-magnesium/47/

One or two nodes are getting connected after 1 or 2 second delay from the time of nodes check in oper inventory. So if test cases can be changed to have little delay before checking nodes from oper inventory, in case of more number of switches connecting, csit will not have any failures.

Can you please comment on this, so that we can have this fix ready.

Comment by Somashekhar Javalagi [ 17/Feb/20 ]

ecelgp  We have identified  a fix for this where we are collecting and writing flow stats first and then we are collecting table stats, to prevent overwrite of flow stats by table stats.

https://git.opendaylight.org/gerrit/c/openflowplugin/+/87624

I have run csit once and not seeing any flow table stats issues, can you please try running suit which includes table stats test cases multiple times, to see if it is stable.

Comment by Luis Gomez [ 17/Feb/20 ]

Cool, the simpler the better. I will run the stats suite multiple times to verify.

Comment by Luis Gomez [ 17/Feb/20 ]

I run 8 times test and I do not see issue.

Comment by Somashekhar Javalagi [ 18/Feb/20 ]

Thanks ecelgp

Generated at Wed Feb 07 20:34:07 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.