[CONTROLLER-904] Randomly missing nodes in Inventory Created: 27/Sep/14  Updated: 25/Jul/23  Resolved: 01/Oct/14

Status: Resolved
Project: controller
Component/s: mdsal
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Christopher O'Shea Assignee: Christopher O'Shea
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 2103

 Description   

Ever now and then we're getting missing nodes in topology manager.

Example [1]
https://jenkins.opendaylight.org/integration/job/integration-master-csit-karaf-compatible-all/266/robot/report/report.html#suite_s1-s4-s1
Log [1]
https://jenkins.opendaylight.org/integration/job/integration-master-csit-karaf-compatible-all/266/artifact/karaf.log
2014-09-26 15:23:07,544

Example [2]
https://jenkins.opendaylight.org/integration/job/integration-master-csit-karaf-compatible-min/127/robot/report/report.html#suite_s1-s1-s2
https://jenkins.opendaylight.org/integration/job/integration-master-csit-karaf-compatible-min/127/artifact/karaf.log
2014-09-26 07:48:07,901

Example [3]
https://jenkins.opendaylight.org/integration/job/integration-master-csit-karaf-compatible-min/131/robot/report/report.html#suite_s1-s1
Log [3]
https://jenkins.opendaylight.org/integration/job/integration-master-csit-karaf-compatible-min/131/artifact/karaf.log

2014-09-26 12:03:37,408



 Comments   
Comment by Luis Gomez [ 27/Sep/14 ]

It is actually nodes and node connectors that do not get into the inventory.

This behavior started to be observed yesterday night on both my local machine + jenkins. Most likely it is introduced by this patch:

https://git.opendaylight.org/gerrit/#/c/11591/

BR/Luis

Comment by Luis Gomez [ 27/Sep/14 ]

So, after trying with this karaf distro:

http://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/distribution-karaf/0.2.0-SNAPSHOT/distribution-karaf-0.2.0-20140926.040841-177.zip

which is just before this got merged:

https://git.opendaylight.org/gerrit/#/c/11591/

Everything is GOOD, so I am 99% convinced the above patch in behind this bug.

BR/Luis

Comment by Ed Warnicke [ 27/Sep/14 ]

OK... so we actually have two bugs here from looking at the logs:

1) We are missing nodes and node connectors on the AD-SAL side
2) We are missing topology links in restconf

I think I have debugged #1 back to its base issue.

I see repeatedly in the logs an exception:

2014-09-26 20:56:00,650 | ERROR | -notification-20 | NotifyTask | 196 - org.opendaylight.controller.sal-binding-broker-impl - 1.1.0.SNAPSHOT | Unhandled exception thrown by listener: org.opendaylight.controller.sal.compatibility.InventoryAndReadAdapter$$Broker$ListenerInvoker@1c856c21
java.lang.IllegalStateException: Transaction TransactionImple < ac, BasicAction: 0:ffffc0a80403:d3da:5425d135:e4 status: ActionStatus.ABORTED > is not in a valid state to be invoking cache operations on.
at org.infinispan.interceptors.TxInterceptor.enlist(TxInterceptor.java:275)[220:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.TxInterceptor.enlistIfNeeded(TxInterceptor.java:231)[220:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]

<snipped below this for length>

It looks like its coming from the InventoryAndReadAdaptor... but if you actually drill down you find:

at org.opendaylight.controller.switchmanager.internal.SwitchManager.updateNodeConnector(SwitchManager.java:1179)[347:org.opendaylight.controller.switchmanager.implementation:0.4.2.SNAPSHOT]
at org.opendaylight.controller.sal.implementation.internal.Inventory.updateNodeConnector(Inventory.java:130)[210:org.opendaylight.controller.sal.implementation:0.4.2.SNAPSHOT]

Which basically means what is happening is the Adaptor is correctly publishing the node to the AD-SAL, but the AD-SAL is failing because cluster.services is not properly initialized.

I am pretty certain that this is because of CONTROLLER-893 and should be resolved by the same fix.

So we are left with just issue #2 (missing restconf topology links)

Comment by Ed Warnicke [ 27/Sep/14 ]

Tony reports no issues with repeatedly running 64 switch topology and suggests this may be a timeout issue in the test. Please try with longer timeouts and report back.

Comment by Tony Tkacik [ 27/Sep/14 ]

https://git.opendaylight.org/gerrit/#/c/11628/ - This ensures that after transaction failure Inventory Manager will continue to work.

I found out if inventory manager hit conflict in transaction and transaction failed because it stopped working, it did not retry to continue and whole processing thread was killed.

Fix is really simple, if transaction chain can not continue, it will create new chain
and try to continue using new chain.

Conflict in transaction could be caused by unaccounted updates from switch and/or concurrent writes by stats manager and/or l2switch.

Comment by Ed Warnicke [ 27/Sep/14 ]

I believe this fixes the missing node case:

https://git.opendaylight.org/gerrit/11632

Basically, what is happening is that if the odl-openflowplugin-southbound feature comes up and accepts connections before the odl-flow-services feature comes then there is no InventoryManager to receive the notifications and thus they are not recorded.

Comment by Ed Warnicke [ 27/Sep/14 ]

How to test this patch.

Before checking out the patch, try to reproduce the problem.

You should be able to reproduce the issue of missing nodes reliably by:

a) Stop controller
b) Start mininet
c) rm -rf ../data
d) Start controller
e) Check for missing nodes

The reasons this works, is that between the time that openflowplugin starts and when InventoryManager starts, at least one node usually connects and thus InventoryManager is not there to receive its NodeUpdated notification.

Verify that you can reliably produce missing nodes by this procedure.

Then:

After trying https://git.opendaylight.org/gerrit/11632

Repeat the steps above and verify you can't reproduce missing nodes.

Comment by Luis Gomez [ 28/Sep/14 ]

I am still seeing #1 with latest code time to time:

1) We are missing nodes and node connectors on the AD-SAL side

Is it possible to take a look on this?

This is probably the only thing missing...

BR/Luis

Comment by Luis Gomez [ 28/Sep/14 ]

Forgot to mention I see the AD-SAL missing inventory when I use large topologies like tree,6 for example. In LF is failing sometimes with tree,5.

Comment by Luis Gomez [ 28/Sep/14 ]

Not sure 100% if related but I see these 2 ERRORs very often in our suite testing karaf min (MD-SAL + AD-SAL):

2014-09-28 16:23:32,073 | ERROR | -notification-31 | InvocationContextInterceptor | 161 - org.opendaylight.controller.clustering.services-implementation - 0.4.3.SNAPSHOT | ISPN000136: Execution error
java.lang.IllegalStateException: Transaction TransactionImple < ac, BasicAction: 0:ffffc0a80403:bee0:54283466:16e2 status: ActionStatus.ABORTED > is not in a valid state to be invoking cache operations on.
at org.infinispan.interceptors.TxInterceptor.enlist(TxInterceptor.java:275)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.TxInterceptor.enlistIfNeeded(TxInterceptor.java:231)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.TxInterceptor.enlistReadAndInvokeNext(TxInterceptor.java:225)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.TxInterceptor.visitGetKeyValueCommand(TxInterceptor.java:221)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.commands.read.GetKeyValueCommand.acceptVisitor(GetKeyValueCommand.java:40)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:112)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.commands.AbstractVisitor.visitGetKeyValueCommand(AbstractVisitor.java:74)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.commands.read.GetKeyValueCommand.acceptVisitor(GetKeyValueCommand.java:40)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.statetransfer.StateTransferInterceptor.handleTopologyAffectedCommand(StateTransferInterceptor.java:263)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.statetransfer.StateTransferInterceptor.handleDefault(StateTransferInterceptor.java:247)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.commands.AbstractVisitor.visitGetKeyValueCommand(AbstractVisitor.java:74)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.commands.read.GetKeyValueCommand.acceptVisitor(GetKeyValueCommand.java:40)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.CacheMgmtInterceptor.visitGetKeyValueCommand(CacheMgmtInterceptor.java:92)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.commands.read.GetKeyValueCommand.acceptVisitor(GetKeyValueCommand.java:40)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:110)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:73)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.commands.AbstractVisitor.visitGetKeyValueCommand(AbstractVisitor.java:74)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.commands.read.GetKeyValueCommand.acceptVisitor(GetKeyValueCommand.java:40)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:333)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.CacheImpl.get(CacheImpl.java:377)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.CacheImpl.get(CacheImpl.java:369)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.opendaylight.controller.connectionmanager.scheme.AbstractScheme.removeNodeFromController(AbstractScheme.java:181)[270:org.opendaylight.controller.connectionmanager.implementation:0.1.2.SNAPSHOT]
at org.opendaylight.controller.connectionmanager.scheme.AbstractScheme.removeNode(AbstractScheme.java:169)[270:org.opendaylight.controller.connectionmanager.implementation:0.1.2.SNAPSHOT]
at org.opendaylight.controller.connectionmanager.internal.ConnectionManager.updateNode(ConnectionManager.java:218)[270:org.opendaylight.controller.connectionmanager.implementation:0.1.2.SNAPSHOT]
at org.opendaylight.controller.sal.implementation.internal.Inventory.updateNode(Inventory.java:115)[150:org.opendaylight.controller.sal.implementation:0.4.2.SNAPSHOT]
at org.opendaylight.controller.sal.compatibility.InventoryAndReadAdapter.publishNodeUpdate(InventoryAndReadAdapter.java:721)[164:org.opendaylight.controller.sal-compatibility:1.1.0.SNAPSHOT]
at org.opendaylight.controller.sal.compatibility.InventoryAndReadAdapter.onNodeRemoved(InventoryAndReadAdapter.java:411)[164:org.opendaylight.controller.sal-compatibility:1.1.0.SNAPSHOT]
at org.opendaylight.controller.sal.compatibility.InventoryAndReadAdapter$$Broker$ListenerInvoker.onNotification(InventoryAndReadAdapter$$Broker$ListenerInvoker.java)[164:org.opendaylight.controller.sal-compatibility:1.1.0.SNAPSHOT]
at org.opendaylight.controller.sal.binding.impl.AbstractNotificationListenerRegistration.notify(AbstractNotificationListenerRegistration.java:38)[135:org.opendaylight.controller.sal-binding-broker-impl:1.1.0.SNAPSHOT]
at org.opendaylight.controller.sal.binding.impl.NotifyTask.run(NotifyTask.java:42)[135:org.opendaylight.controller.sal-binding-broker-impl:1.1.0.SNAPSHOT]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)[:1.7.0_65]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)[:1.7.0_65]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_65]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_65]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_65]

2014-09-28 16:23:32,073 | ERROR | -notification-31 | NotifyTask | 135 - org.opendaylight.controller.sal-binding-broker-impl - 1.1.0.SNAPSHOT | Unhandled exception thrown by listener: org.opendaylight.controller.sal.compatibility.InventoryAndReadAdapter$$Broker$ListenerInvoker@6736629f
java.lang.IllegalStateException: Transaction TransactionImple < ac, BasicAction: 0:ffffc0a80403:bee0:54283466:16e2 status: ActionStatus.ABORTED > is not in a valid state to be invoking cache operations on.
at org.infinispan.interceptors.TxInterceptor.enlist(TxInterceptor.java:275)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.TxInterceptor.enlistIfNeeded(TxInterceptor.java:231)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.TxInterceptor.enlistReadAndInvokeNext(TxInterceptor.java:225)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.TxInterceptor.visitGetKeyValueCommand(TxInterceptor.java:221)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.commands.read.GetKeyValueCommand.acceptVisitor(GetKeyValueCommand.java:40)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:112)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.commands.AbstractVisitor.visitGetKeyValueCommand(AbstractVisitor.java:74)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.commands.read.GetKeyValueCommand.acceptVisitor(GetKeyValueCommand.java:40)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.statetransfer.StateTransferInterceptor.handleTopologyAffectedCommand(StateTransferInterceptor.java:263)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.statetransfer.StateTransferInterceptor.handleDefault(StateTransferInterceptor.java:247)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.commands.AbstractVisitor.visitGetKeyValueCommand(AbstractVisitor.java:74)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.commands.read.GetKeyValueCommand.acceptVisitor(GetKeyValueCommand.java:40)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.CacheMgmtInterceptor.visitGetKeyValueCommand(CacheMgmtInterceptor.java:92)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.commands.read.GetKeyValueCommand.acceptVisitor(GetKeyValueCommand.java:40)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:110)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:73)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.commands.AbstractVisitor.visitGetKeyValueCommand(AbstractVisitor.java:74)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.commands.read.GetKeyValueCommand.acceptVisitor(GetKeyValueCommand.java:40)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:333)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.CacheImpl.get(CacheImpl.java:377)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.infinispan.CacheImpl.get(CacheImpl.java:369)[161:org.opendaylight.controller.clustering.services-implementation:0.4.3.SNAPSHOT]
at org.opendaylight.controller.connectionmanager.scheme.AbstractScheme.removeNodeFromController(AbstractScheme.java:181)[270:org.opendaylight.controller.connectionmanager.implementation:0.1.2.SNAPSHOT]
at org.opendaylight.controller.connectionmanager.scheme.AbstractScheme.removeNode(AbstractScheme.java:169)[270:org.opendaylight.controller.connectionmanager.implementation:0.1.2.SNAPSHOT]
at org.opendaylight.controller.connectionmanager.internal.ConnectionManager.updateNode(ConnectionManager.java:218)[270:org.opendaylight.controller.connectionmanager.implementation:0.1.2.SNAPSHOT]
at org.opendaylight.controller.sal.implementation.internal.Inventory.updateNode(Inventory.java:115)[150:org.opendaylight.controller.sal.implementation:0.4.2.SNAPSHOT]
at org.opendaylight.controller.sal.compatibility.InventoryAndReadAdapter.publishNodeUpdate(InventoryAndReadAdapter.java:721)[164:org.opendaylight.controller.sal-compatibility:1.1.0.SNAPSHOT]
at org.opendaylight.controller.sal.compatibility.InventoryAndReadAdapter.onNodeRemoved(InventoryAndReadAdapter.java:411)[164:org.opendaylight.controller.sal-compatibility:1.1.0.SNAPSHOT]
at org.opendaylight.controller.sal.compatibility.InventoryAndReadAdapter$$Broker$ListenerInvoker.onNotification(InventoryAndReadAdapter$$Broker$ListenerInvoker.java)[164:org.opendaylight.controller.sal-compatibility:1.1.0.SNAPSHOT]
at org.opendaylight.controller.sal.binding.impl.AbstractNotificationListenerRegistration.notify(AbstractNotificationListenerRegistration.java:38)[135:org.opendaylight.controller.sal-binding-broker-impl:1.1.0.SNAPSHOT]
at org.opendaylight.controller.sal.binding.impl.NotifyTask.run(NotifyTask.java:42)[135:org.opendaylight.controller.sal-binding-broker-impl:1.1.0.SNAPSHOT]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)[:1.7.0_65]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)[:1.7.0_65]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_65]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_65]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_65]

Comment by Colin Dixon [ 28/Sep/14 ]

Luis says this isn't fixed yet.

Comment by Colin Dixon [ 28/Sep/14 ]

Flagging this is as critical so it shows up in the must fix bug search. Also, noting that the current attempt has been merged in this patch:
https://git.opendaylight.org/gerrit/#/c/11628/1

However, as noted in the previous comment, even with this patch Luis is seeing failures.

Comment by Jan Medved [ 28/Sep/14 ]

Still seeing the problem with a recent integration image. It happens 100% of the time when controller started with a running mininet. For example, with a 15-node tree topology, the controller consistently show 13 nodes. (i.e. it 100% of the time misses 3 nodes).

Comment by Luis Gomez [ 29/Sep/14 ]

Regarding the AD-SAL issue, I think it is just the nodeconnector id=0 that sometimes does not get reported. This nodeconnector is not really a port but the switch (see below switch manager printout):

{
"nodeConnectorProperties": [
{
"properties": {
"name":

{ "value": "s1-eth2" }

,
"state":

{ "value": 1 }

,
"config":

{ "value": 1 }

,
"bandwidth":

{ "value": 10000000000 }

},
"nodeconnector": {
"node":

{ "id": "00:00:00:00:00:00:00:01", "type": "OF" }

,
"id": "2",
"type": "OF"
}
},
{
"properties": {
"name":

{ "value": "s1-eth1" }

,
"state":

{ "value": 1 }

,
"config":

{ "value": 1 }

,
"bandwidth":

{ "value": 10000000000 }

},
"nodeconnector": {
"node":

{ "id": "00:00:00:00:00:00:00:01", "type": "OF" }

,
"id": "1",
"type": "OF"
}
},
{
"properties": {
"name":

{ "value": "s1" }

,
"state":

{ "value": 1 }

,
"config":

{ "value": 1 }

},
"nodeconnector": {
"node":

{ "id": "00:00:00:00:00:00:00:01", "type": "OF" }

,
"id": "0",
"type": "SW"
}
}
]
}

So there is an issue reporting this particular port but it may not be maybe a show stopper.

BR/Luis

Comment by Luis Gomez [ 29/Sep/14 ]

OK, I am not sure what this does, but this patch seems to fix the AD-SAL issue:

https://git.opendaylight.org/gerrit/#/c/11654/

BR/Luis

Comment by Ed Warnicke [ 30/Sep/14 ]

Reproduction and debugging so far:

1) Start from here: https://git.opendaylight.org/gerrit/#/c/11663 (controller)
2) Extra debugs for OF plugin: https://git.opendaylight.org/gerrit/11666
3) Build the integration controller -nsu after building those.
4) Edit target/assembly/etc/org.ops4j.pax.logging.cfg and change the line:
log4j.appender.out.maxFileSize=1MB

to

log4j.appender.out.maxFileSize=100MB
5) Run controller, and use:

log:set DEBUG org.opendaylight.controller.md.inventory.manager.NodeChangeCommiter
log:set DEBUG org.opendaylight.openflowplugin.openflow.md.core.sal.SalRegistrationManager
log:set DEBUG org.opendaylight.openflowplugin.openflow.md.queue.QueueKeeperFairImpl
log:set DEBUG org.opendaylight.openflowplugin.openflow.md.queue.TicketFinisherImpl
log:set DEBUG org.opendaylight.openflowplugin.openflow.md.queue.TicketProcessorFactoryImpl
log:set DEBUG org.opendaylight.openflowplugin.openflow.md.queue.QueueProcessorLightImpl
feature:install odl-l2switch-switch-ui

6) Run a large mininet topology:
sudo mn --topo tree,7 --controller 'remote,ip=10.0.2.2' --switch ovsk,protocols=OpenFlow13

7) Check to see how many nodes you have:
curl --user admin:admin http://localhost:8181/restconf/operational/opendaylight-inventory:nodes/ | python -m json.tool | grep '\"id\": \"openflow:' | sort | grep -v ":[0-9]:[0-9]" | wc -l

Should be 127, but often is one or two less. If there are one or two less, then we can start debugging.

8) cd data/log
9) Check for the number of NodeUpdated events originating from SalRegistationManager.onSessionAdded() line 103:

LOG.debug("ModelDrivenSwitch for {} registered to MD-SAL.", datapathId.toString());

By running

grep "registered to MD-SAL" * | grep "ModelDrivenSwitch" | wc -l

For me, this is always 127, even when I don't see the full 127 in restconf.

10) Check for the number of NodeUpdated events coming out of the TicketFinisherImpl.firePopNotification() line 78:

grep "about to apply popListener" * | grep NodeUpdated | awk -F'_value=' '

{print $2}' | sort -u | wc -l

For me this always shows the number of flows that I see in restconf... which I interpret as meaning we are loosing a message in between in OFplugin.

11) Check for the number of Node Updated events coming out of the QueueKeeperFairImpl.poll() line 82:

grep "remove message from queue" * | grep NodeUpdated | awk -F'_value=' '{print $2}

' | wc -l

For me this is always the same number I see in restconf, meaning we are loosing the message before this.

12) Check for the number of NodeUpdated events coming out of the QueueKeeperImpl.push() line 56:

grep "enqueuing message" * | awk -F'_value=' '

{print $2}

' | wc -l

For me this is always 127.

So we appear to be loosing the NodeUpdated messages between
QueueKeeperImpl.push() and QueueKeeperFairImpl.poll()

Comment by Luis Gomez [ 30/Sep/14 ]

Hi Ed,

I tried several times mininet tree,7 with odl-l2switch-switch-ui feature and I cannot reproduce the issue you describe. I always get the 127 nodes along with all stats.

BR/Luis

Comment by Luis Gomez [ 30/Sep/14 ]

With regards to AD-SAL missing inventory, I downloaded latest RC1:

http://nexus.opendaylight.org/content/repositories/opendaylight.daily/org/opendaylight/integration/distribution-karaf/0.2.0-Helium-RC1-6-v201409220013/

And there I do not see any AD-SAL inventory issue running mininet tree,7

One thing I realize with this build is the stats collection at mininet startup follows very much the order the nodes are brought up: openflow:1 -> openflow:127

Since RC2 the stats collection is kind of random, jumping from 1 switch to another with no particular order. I believe this is part of the stats improvements but maybe this way of collecting is impacting how AD-SAL learns inventory

BR/Luis

Comment by Tony Tkacik [ 01/Oct/14 ]

Merged patchset: https://git.opendaylight.org/gerrit/#/c/11675/

Please verify and close this bug, if it is fixed.

Comment by Christopher O'Shea [ 01/Oct/14 ]

Patch been tested and working

Generated at Wed Feb 07 19:54:10 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.