[NETVIRT-1504] VpnInterfaceManager filling up logs at alarming rate Created: 13/Nov/18  Updated: 15/Nov/18  Resolved: 15/Nov/18

Status: Resolved
Project: netvirt
Component/s: vpnmanager
Affects Version/s: Fluorine-SR1
Fix Version/s: Fluorine-SR1, Neon

Type: Bug Priority: Highest
Reporter: Vishal Thapar Assignee: Vishal Thapar
Resolution: Done Votes: 0
Labels: csit:3node
Remaining Estimate: 0 minutes
Time Spent: 4 hours
Original Estimate: Not Specified


 Description   

Netvirt 3 node csit failing after 6 hours. Logs show two things:

1. Default config shard in error state on ODL1.
2. VPNInterfaceManager repeatedly filling log with error messages at a near instantaneous rate without any delays.

Cluster issues need to be looked into, but VPNInterfaceManager shouldn't be filling up logs with ERROR messages at such a high rate.
2018-11-07T00:34:55,909 | ERROR | jobcoordinator-main-task-0 | VpnInterfaceManager | 359 - org.opendaylight.netvirt.vpnmanager-impl - 0.7.1 | add: VpnInstance b15bbaf1-6b95-46c9-b19d-83d7719804ba for vpnInterface 052f2318-dd2a-4530-9703-fdd47b13c3cb not ready, holding on
2018-11-07T00:34:55,909 | ERROR | jobcoordinator-main-task-0 | VpnInterfaceManager | 359 - org.opendaylight.netvirt.vpnmanager-impl - 0.7.1 | add: VpnInstance b15bbaf1-6b95-46c9-b19d-83d7719804ba for vpnInterface 93517e2a-2534-42c4-a334-2454e04fb17f not ready, holding on
2018-11-07T00:34:55,910 | ERROR | jobcoordinator-main-task-0 | VpnInterfaceManager | 359 - org.opendaylight.netvirt.vpnmanager-impl - 0.7.1 | add: VpnInstance b15bbaf1-6b95-46c9-b19d-83d7719804ba for vpnInterface 4efba312-0be0-4a6a-a059-3636b344c2aa not ready, holding on
2018-11-07T00:34:55,911 | ERROR | jobcoordinator-main-task-0 | VpnInterfaceManager | 359 - org.opendaylight.netvirt.vpnmanager-impl - 0.7.1 | add: VpnInstance b15bbaf1-6b95-46c9-b19d-83d7719804ba for vpnInterface 1cb60d50-95dd-4ec6-a78e-375c90b36490 not ready, holding on

Guilty code is in VpnInterfaceManager.java:

@Override
public void add(final InstanceIdentifier<VpnInterface> identifier, final VpnInterface vpnInterface) {
LOG.trace("Received VpnInterface add event: vpnInterface={}", vpnInterface);
LOG.info("add: intfName {} onto vpnName {}", vpnInterface.getName(),
VpnHelper.getVpnInterfaceVpnInstanceNamesString(vpnInterface.getVpnInstanceNames()));
addVpnInterface(identifier, vpnInterface, null, null);
}

addVpnInterface eventuall calls

private void addVpnInterfaceCall(final InstanceIdentifier<VpnInterface> identifier, final VpnInterface vpnInterface,
final List<Adjacency> oldAdjs, final List<Adjacency> newAdjs, String vpnName) {
final VpnInterfaceKey key = identifier.firstKeyOf(VpnInterface.class);
final String interfaceName = key.getName();

if (!canHandleNewVpnInterface(identifier, vpnInterface, vpnName)) {
LOG.error("add: VpnInstance {} for vpnInterface {} not ready, holding on ",
vpnName, vpnInterface.getName());
return;
}

The log messages in above to methods are what are filling up the log.

This code should not be:
1. Retrying forever
2. Retrying without any delay.



 Comments   
Comment by Vishal Thapar [ 13/Nov/18 ]

Issue is in VPnInterfaceManager's add retry logic. It is working something like this:

1. Try to add interface.
2. If it fails, add it to unprocessed interfaces.
3. also OnFailure of DJC method is to process saved interfaces.

This means we're repeatedly trying to process it without any delay, basically an infinite loop.

Generated at Wed Feb 07 20:24:13 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.