[OPNFLWPLUG-1086] Reconciliation framework failure when starting cbench tool for the first time Created: 25/Feb/20 Updated: 27/Feb/20 Resolved: 27/Feb/20 |
|
| Status: | Verified |
| Project: | OpenFlowPlugin |
| Component/s: | reconciliation-framework |
| Affects Version/s: | None |
| Fix Version/s: | Magnesium |
| Type: | Bug | Priority: | Highest |
| Reporter: | Jamo Luhrsen | Assignee: | Gobinath Suganthan |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
this issue was noticed because an old CSIT job using cbench had a consistent failure that is not The cbench tool fails on starting with a "closed connection ... exiting" message. snippet from this karaf.log : 2020-02-22T03:05:59,721 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-6 | GuardedContextImpl | 310 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Starting RpcContextImpl[NEW] service for node openflow:0 2020-02-22T03:05:59,721 | WARN | ofppool-7 | ContextChainHolderImpl | 310 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Reconciliation framework failure. 2020-02-22T03:05:59,722 | INFO | ofppool-7 | ReconciliationManagerImpl | 302 - org.opendaylight.openflowplugin.applications.reconciliation-framework - 0.10.0.SNAPSHOT | Stopping reconciliation for node Uri{_value=openflow:10} 2020-02-22T03:05:59,726 | WARN | ofppool-0 | ContextChainHolderImpl | 310 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Reconciliation framework failure. 2020-02-22T03:05:59,726 | INFO | ofppool-0 | ReconciliationManagerImpl | 302 - org.opendaylight.openflowplugin.applications.reconciliation-framework - 0.10.0.SNAPSHOT | Stopping reconciliation for node Uri{_value=openflow:5} 2020-02-22T03:05:59,738 | INFO | ListenableFutureAdapter-thread-0 | ContextChainHolderImpl | 310 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Device openflow:3 connection is enabled by reconciliation framework. Only the first try fails though, and additional cbench tests are ok. We should figure out the root cause of the failure before we talk about the few workarounds
|
| Comments |
| Comment by Jamo Luhrsen [ 25/Feb/20 ] |
|
Attached is karaf.log from a fresh boot with TRACE enabled for 'org.opendaylight.openflowplugin.impl' the cbench command used was: cbench -c 10.1.154.174 -m 3000 -M 1 -s 2 -l 2 -D 3000 -p 6633 notice the -s 2 for a switch count of 2. The CSIT job uses much more, but I was trying to reduce the number |
| Comment by Arunprakash D [ 26/Feb/20 ] |
|
gobinath@ericsson.com will get back on the analysis by tomorrow |
| Comment by Gobinath Suganthan [ 26/Feb/20 ] |
|
jluhrsen I want to add some logging in openflowplugin project and re-run this cbench test for debugging. Any way I could start this job on a openflowplugin review? |
| Comment by Gobinath Suganthan [ 26/Feb/20 ] |
|
Also I could see the following errors in the karaf log: "2020-02-11T12:46:26,913 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-37 | RoleContextImpl | 310 - org.opendaylight.openflowplugin.impl - 0.10.0 | Device: openflow:6 with version: 1 does not support role BECOMEMASTER" What is the openflow version used by the switches in cbench? |
| Comment by Jamo Luhrsen [ 26/Feb/20 ] |
|
Once this patch is merged, you can add the comment: test-openflowplugin-cbench to the patch you are adding debug logs to and it should give you what you want. You'll want to make sure your |
| Comment by Jamo Luhrsen [ 26/Feb/20 ] |
|
yes, IIRC, cbench got stuck on OF1.1 and never made it to 1.3. |
| Comment by Jamo Luhrsen [ 26/Feb/20 ] |
|
|
| Comment by Gobinath Suganthan [ 27/Feb/20 ] |
|
thanks a lot jluhrsen |
| Comment by Gobinath Suganthan [ 27/Feb/20 ] |
|
I have raised a review & triggered the job but the job seem to be waiting with no available executor for too long. https://git.opendaylight.org/gerrit/c/openflowplugin/+/88084 Job link: https://jenkins.opendaylight.org/releng/job/openflowplugin-patch-test-cbench-magnesium/1/ Alternately could you please send the steps to set up cbench in local. (couldn't find complete working steps anywhere)
|
| Comment by Gobinath Suganthan [ 27/Feb/20 ] |
|
I had added some logging and rerun the tests. From the logs, it was found that the "flownodereconciliation" service was throwing an uncaught exception (ConcurrentModificationException) which had resulted in the reconciliation framework.
Logs: 2020-02-27T11:48:31,071 | ERROR | ofppool-2 | ContextChainHolderImpl | 310 - org.opendaylight.openflowplugin.impl - 0.10.0.SNAPSHOT | Reconciliation framework failure for device openflow:1 with error
I have fixed this and added some logs to debug similar issues. https://git.opendaylight.org/gerrit/c/openflowplugin/+/88084 Note: The "flownodereconciliation" task completes fast when the switch connects initially (no flows are present in inventory config for the node). So the race condition would be more probable in this case and we see might be seeing the cbench failures only during initial connection.
|
| Comment by Jamo Luhrsen [ 27/Feb/20 ] |
|
Great debugging and thanks for the quick fix, gobinath. the patch test job is passing now and I have merged the stable/magnesium patch. does this need to be cherry-picked to sodium? I assume it should be in master, so I already made that cherry pick |
| Comment by Gobinath Suganthan [ 27/Feb/20 ] |
|
Yes. This has to be cherry-picked to Sodium too. Cherry-picked here https://git.opendaylight.org/gerrit/c/openflowplugin/+/87965 It is surprising that the cbench is passing in Sodium however. |
| Comment by Jamo Luhrsen [ 27/Feb/20 ] |
|
actually, now that I think about it, I think I was able to reproduce this locally with sodium. Just didn't spend much time double checking. Maybe upstream CSIT is slightly slow enough in sodium + that shared cloud environment compared to my local laptop that the race condition isn't hit. |