[INTDIST-4] ConfigPusher failed on toster config file and stopped new OF plugin to connect switches Created: 29/May/14  Updated: 03/May/18  Resolved: 19/Nov/16

Status: Resolved
Project: integration-distribution
Component/s: Features
Affects Version/s: unspecified
Fix Version/s: None

Type: Bug
Reporter: Hideyuki Tai Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Linux
Platform: PC


Issue Links:
Blocks
blocks INTDIST-5 missing openflowplugin (-of13) support Resolved
External issue ID: 1099

 Description   

I run the Virtualization edition with vtn option and -of13 option.

$ ./run.sh -of13 -virt vtn

Three minutes later, ConfigPusher output the following error messages.
It seemed to me that ConfigPusher failed to set configuration for toaster bundle.

==================== Error messages =================================

2014-05-29 03:11:45.506 EDT [config-pusher] ERROR o.o.c.l.b.i.UncaughtExceptionPolicy - Thread Thread[config-pusher,5,main] died because of an uncaught exception
java.lang.IllegalStateException: Max wait for capabilities reached.Not enough capabilities for ConfigSnapshot

{configSnapshot='<data xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"><modules xmlns="urn:opendaylight:params:xml:ns:yang:controller:config"><module><type xmlns:toaster="urn:opendaylight:params:xml:ns:yang:controller:config:toaster-provider:impl"> toaster:toaster-provider-impl </type><name>toaster-provider-impl</name><rpc-registry><type xmlns:binding="urn:opendaylight:params:xml:ns:yang:controller:md:sal:binding">binding:binding-rpc-registry</type><name>binding-rpc-broker</name> </rpc-registry><data-broker><type xmlns:binding="urn:opendaylight:params:xml:ns:yang:controller:md:sal:binding">binding:binding-data-broker</type><name>binding-data-broker</name> </data-broker><notification-service><type xmlns:binding="urn:opendaylight:params:xml:ns:yang:controller:md:sal:binding"> binding:binding-notification-service </type><name>binding-notification-broker</name> </notification-service> </module><module><type xmlns:kitchen="urn:opendaylight:params:xml:ns:yang:controller:config:kitchen-service:impl"> kitchen:kitchen-service-impl </type><name>kitchen-service-impl</name><rpc-registry><type xmlns:binding="urn:opendaylight:params:xml:ns:yang:controller:md:sal:binding">binding:binding-rpc-registry</type><name>binding-rpc-broker</name> </rpc-registry><notification-service><type xmlns:binding="urn:opendaylight:params:xml:ns:yang:controller:md:sal:binding"> binding:binding-notification-service </type><name>binding-notification-broker</name> </notification-service> </module> </modules><services xmlns="urn:opendaylight:params:xml:ns:yang:controller:config"><service><type xmlns:kitchen="urn:opendaylight:params:xml:ns:yang:controller:config:kitchen-service:impl"> kitchen:kitchen-service </type><instance><name>kitchen-service</name><provider>/modules/module[type='kitchen-service-impl'][name='kitchen-service-impl']</provider> </instance> </service> </services> </data> ', capabilities=[urn:opendaylight:params:xml:ns:yang:controller:config:kitchen-service:impl?module=kitchen-service-impl&revision=2014-01-31, urn:opendaylight:params:xml:ns:yang:controller:config:toaster-provider:impl?module=toaster-provider-impl&revision=2014-01-31]}

. Expected but not found: [urn:opendaylight:params:xml:ns:yang:controller:config:toaster-provider:impl?module=toaster-provider-impl&revision=2014-01-31, urn:opendaylight:params:xml:ns:yang:controller:config:kitchen-service:impl?module=kitchen-service-impl&revision=2014-01-31]
at org.opendaylight.controller.netconf.persist.impl.ConfigPusher.getOperationServiceWithRetries(ConfigPusher.java:111) ~[na:na]
at org.opendaylight.controller.netconf.persist.impl.ConfigPusher.pushConfigWithConflictingVersionRetries(ConfigPusher.java:87) ~[na:na]
at org.opendaylight.controller.netconf.persist.impl.ConfigPusher.pushConfigs(ConfigPusher.java:69) ~[na:na]
at org.opendaylight.controller.netconf.persist.impl.osgi.ConfigPersisterActivator$InnerCustomizer$1.run(ConfigPersisterActivator.java:156) ~[na:na]
at java.lang.Thread.run(Thread.java:722) ~[na:1.7.0_15]
Caused by: org.opendaylight.controller.netconf.persist.impl.ConfigPusher$NotEnoughCapabilitiesException: Not enough capabilities for ConfigSnapshot

{configSnapshot='<data xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"><modules xmlns="urn:opendaylight:params:xml:ns:yang:controller:config"><module><type xmlns:toaster="urn:opendaylight:params:xml:ns:yang:controller:config:toaster-provider:impl"> toaster:toaster-provider-impl </type><name>toaster-provider-impl</name><rpc-registry><type xmlns:binding="urn:opendaylight:params:xml:ns:yang:controller:md:sal:binding">binding:binding-rpc-registry</type><name>binding-rpc-broker</name> </rpc-registry><data-broker><type xmlns:binding="urn:opendaylight:params:xml:ns:yang:controller:md:sal:binding">binding:binding-data-broker</type><name>binding-data-broker</name> </data-broker><notification-service><type xmlns:binding="urn:opendaylight:params:xml:ns:yang:controller:md:sal:binding"> binding:binding-notification-service </type><name>binding-notification-broker</name> </notification-service> </module><module><type xmlns:kitchen="urn:opendaylight:params:xml:ns:yang:controller:config:kitchen-service:impl"> kitchen:kitchen-service-impl </type><name>kitchen-service-impl</name><rpc-registry><type xmlns:binding="urn:opendaylight:params:xml:ns:yang:controller:md:sal:binding">binding:binding-rpc-registry</type><name>binding-rpc-broker</name> </rpc-registry><notification-service><type xmlns:binding="urn:opendaylight:params:xml:ns:yang:controller:md:sal:binding"> binding:binding-notification-service </type><name>binding-notification-broker</name> </notification-service> </module> </modules><services xmlns="urn:opendaylight:params:xml:ns:yang:controller:config"><service><type xmlns:kitchen="urn:opendaylight:params:xml:ns:yang:controller:config:kitchen-service:impl"> kitchen:kitchen-service </type><instance><name>kitchen-service</name><provider>/modules/module[type='kitchen-service-impl'][name='kitchen-service-impl']</provider> </instance> </service> </services> </data> ', capabilities=[urn:opendaylight:params:xml:ns:yang:controller:config:kitchen-service:impl?module=kitchen-service-impl&revision=2014-01-31, urn:opendaylight:params:xml:ns:yang:controller:config:toaster-provider:impl?module=toaster-provider-impl&revision=2014-01-31]}

. Expected but not found: [urn:opendaylight:params:xml:ns:yang:controller:config:toaster-provider:impl?module=toaster-provider-impl&revision=2014-01-31, urn:opendaylight:params:xml:ns:yang:controller:config:kitchen-service:impl?module=kitchen-service-impl&revision=2014-01-31]
at org.opendaylight.controller.netconf.persist.impl.ConfigPusher.getOperationService(ConfigPusher.java:149) ~[na:na]
at org.opendaylight.controller.netconf.persist.impl.ConfigPusher.getOperationServiceWithRetries(ConfigPusher.java:104) ~[na:na]
... 4 common frames omitted

==================== End =================================

I confirmed that the new OF plugin run in ACTIVE.

osgi> ss openflowplugin
"Framework is launched."

id State Bundle
25 ACTIVE org.opendaylight.openflowplugin_0.0.3.SNAPSHOT
osgi>

However, the new OF plugin did not start to listen for TCP 6633 and 6653 port.

odc180 $ netstat -na | grep 6633
odc180 $ netstat -na | grep 6653
odc180 $

Therefore, OpenFlow switches did not establish OF channel with the controller.
I think that the new OF plugin was not successfully initialized.

I used the latest version of the Virtualization edition on 05/27.

odc180 $ git log -3 --oneline
64e608a Adding extra TCs for RESTCONF
3516960 Uploaded working AD-SAL cluster testcases
9903694 Fix TCP-MD5 config not being included



 Comments   
Comment by Hideyuki Tai [ 29/May/14 ]

When we run the Virtualization edition with vtn option, all sample bundles including toaster are not loaded.

If I modify run.sh as follows and load toaster bundle, I can avoid this issue.
In this case, the new OF plugin start to listen for TCP 6633 and 6653 port, so OpenFlow switches connect with the controller.

36 elif [ "${virt}" == "vtn" ]; then
37 # ODL_VIRT_FILTER="affinity|opendove|ovsdb.ovsdb.neutron|controller.(arphandler|samples)"
38 ODL_VIRT_FILTER="affinity|opendove|ovsdb.ovsdb.neutron|controller.(arphandler|samples.(simpleforwarding|loadbalancer))"

Toaster bundle is just a sample bundle, so I think OpenDaylight controller must successfully run even when toaster module is not loaded.

Comment by Tomas Olvecky [ 29/May/14 ]

If you delete a bundle from base controller distribution you risk running into issues like this. Quick workaround would be to delete 03-toaster-sample.xml from configuration/initial . You might propose moving samples away from controller distribution on the mailing list.

Comment by Hideyuki Tai [ 29/May/14 ]

(In reply to Tomas Olvecky from comment #2)
> If you delete a bundle from base controller distribution you risk running
> into issues like this. Quick workaround would be to delete
> 03-toaster-sample.xml from configuration/initial . You might propose moving
> samples away from controller distribution on the mailing list.

It is definitely a bug, so it should be fixed.

The new OF plugin does not depend on toaster bundle.
Therefore, a problem related to the toaster bundle must NOT make an impact on the OF plugin.

Comment by Tomas Olvecky [ 29/May/14 ]

> The new OF plugin does not depend on toaster bundle.
> Therefore, a problem related to the toaster bundle must NOT make an impact
> on the OF plugin.

You have a dependency on whole controller distribution. The distribution is not guaranteed to be consistent if you start deleting bundles from it, because as in this case there might be configuration files dependent on those bundles.

Comment by Hideyuki Tai [ 29/May/14 ]

(In reply to Tomas Olvecky from comment #4)
> > The new OF plugin does not depend on toaster bundle.
> > Therefore, a problem related to the toaster bundle must NOT make an impact
> > on the OF plugin.
>
> You have a dependency on whole controller distribution. The distribution is
> not guaranteed to be consistent if you start deleting bundles from it,
> because as in this case there might be configuration files dependent on
> those bundles.

I think the configuration file of toaster depends on toater bundle.
However, this fact is not related to the new OF plugin.

The new OF plugin does NOT depend on the configuration file of toaster and toaster bundle.
And toaster is NOT a core component of controller but jsut a sample application.
Therefore, I strongly belive that controller project MUST guarantee that not-loading toaster bundle does not any impact on other important bundles such as the new OF plugin.
OpenDaylight controller uses OSGi framework, so I think OpenDaylight controller should also support dynamically loading and unloading sample applications.

IF the new OF plugin depends on the configuration file of toaster, we should add the dependency into the pom.xml of the OF plugin bundle. (Again, I belive that the new OF plugin does NOT depend on the configuration file.)

Comment by Tomas Olvecky [ 29/May/14 ]

(In reply to Hideyuki Tai from comment #5)
> (In reply to Tomas Olvecky from comment #4)
> > > The new OF plugin does not depend on toaster bundle.
> > > Therefore, a problem related to the toaster bundle must NOT make an impact
> > > on the OF plugin.
> >
> > You have a dependency on whole controller distribution.
>
> The new OF plugin does NOT depend on the configuration file of toaster and
> toaster bundle.

I just checked that distributions-openflowplugin-base project is dependent on distribution.opendaylight . The 03-toaster-sample.xml belongs to distribution.opendaylight. If you think that this should not be the case, please consider filing a bug on how distribution is created, or propose adding another distribution. This bug cannot be solved in persister or netconf subsystem.

> And toaster is NOT a core component of controller but jsut a sample
> application.
> Therefore, I strongly belive that controller project MUST guarantee that
> not-loading toaster bundle does not any impact on other important bundles
> such as the new OF plugin.
> OpenDaylight controller uses OSGi framework, so I think OpenDaylight
> controller should also support dynamically loading and unloading sample
> applications.

You can dynamically uninstall toaster bundles, or any other bundle, but again when you start modifying it, it is not guaranteed that controller will work.

Comment by Tony Tkacik [ 29/May/14 ]

(In reply to Hideyuki Tai from comment #5)
> (In reply to Tomas Olvecky from comment #4)
> > > The new OF plugin does not depend on toaster bundle.
> > > Therefore, a problem related to the toaster bundle must NOT make an impact
> > > on the OF plugin.
> >
> > You have a dependency on whole controller distribution. The distribution is
> > not guaranteed to be consistent if you start deleting bundles from it,
> > because as in this case there might be configuration files dependent on
> > those bundles.

As Tomas pointed before, if you started removing bundles, you are modifying
controller distribution, you need to
remove also resources depending on them, such as configuration for toaster.

> I think the configuration file of toaster depends on toaster bundle.

configuration/initial says, there MUST be toaster bundles in order to
start controller successfuly.

Your filter
ODL_VIRT_FILTER="affinity|opendove|ovsdb.ovsdb.neutron|controller.(arphandler|samples)"

Removed this bundles, but did not remove configuration associated with it,
thus Config persister stopped pushing configuration, since running controller
did not have bundles loaded according to configuration.

> However, this fact is not related to the new OF plugin.

Yes, this fact is not relevant to new OF plugin, also new OF Plugin
is not part of the controller.

> The new OF plugin does NOT depend on the configuration file of toaster and
> toaster bundle.
> And toaster is NOT a core component of controller but just a sample
> application.

I agree it is not core component, but is part of controller distribution.

> Therefore, I strongly believe that controller project MUST guarantee that
> not-loading toaster bundle does not any impact on other important bundles
> such as the new OF plugin.

These guarantee should not be part of the controller, but derived
distribution (in this case openflow / virtualization). Since Controller project does
not know target objectives of all derived distributions, this is responsibility
of derived distrubutions, which started.

Comment by Hideyuki Tai [ 29/May/14 ]

(In reply to Tony Tkacik from comment #7)
> (In reply to Hideyuki Tai from comment #5)
> configuration/initial says, there MUST be toaster bundles in order to
> start controller successfuly.
>
> Your filter
> ODL_VIRT_FILTER="affinity|opendove|ovsdb.ovsdb.neutron|controller.
> (arphandler|samples)"
>
> Removed this bundles, but did not remove configuration associated with it,
> thus Config persister stopped pushing configuration, since running controller
> did not have bundles loaded according to configuration.
>
>

Thank you for your comments.

My point is that Config persiter should NOT STOP on such errors.
I think that Config persister should skip the error on toaster configuration, and CONTINUE to push other configuration for other bundles.

Is there any reason that Config persister should STOP on such errors?
I think the right behavior is that Config persister skips such an error.
It offers us more safe way to load/unload bundles.

Now, I think we should submit a patch to change run.sh for the Virtualization edition to delete toaster configuration when an user execute the run.sh with vtn option.
(Because affinity option also make it unloading sample bundles, I think we should also delete the configuration file when with affinity option)

However, I think Config persister should be also fixed.

Comment by Robert Varga [ 29/May/14 ]

I disagree. The initial configuration has to be consistent with the distribution, otherwise we are talking about a mis-integrated product.

If you remove a component and retain a configuration part, you have to also remove any references to it. If we were to implement the change your request, could easily end up with a missing service definition and a system which looks as if it's completely up, except it is not.

The ConfigPusher behavior is precisely what it should be as per the 'fail fast, fail reliably' design philosophy. These integration problems have to be detected, reported and fixed, rather than be papered over.

Comment by Hideyuki Tai [ 30/May/14 ]

(In reply to Robert Varga from comment #9)
> I disagree. The initial configuration has to be consistent with the
> distribution, otherwise we are talking about a mis-integrated product.
>
> If you remove a component and retain a configuration part, you have to also
> remove any references to it. If we were to implement the change your
> request, could easily end up with a missing service definition and a system
> which looks as if it's completely up, except it is not.
>
> The ConfigPusher behavior is precisely what it should be as per the 'fail
> fast, fail reliably' design philosophy. These integration problems have to
> be detected, reported and fixed, rather than be papered over.

Thank you for your comment. I understand the design of Config Pusher.

I'm not saying that the integration issue should not be papered over.
Here, I'm focusing on the behavior of controller which stops on an error of toaster configuration, and not continue to push configuration for the new OF plugin, so I filed it to Controller project.

I've not yet file the integration issue to Integration group, so it likely make this confused.

[1. Integration issue in Helium.]

The error on toaster configuration is an integration issue of the Virtualization Edtion in Helium.
Therefore, I totally agree that this must be fixed.
My team has already worked on it, and I'm going to file this integration issue to integration group on the Bugzilla.

(I think I should firstly file this integration issue before filing this ticket to avoid confusing. here)

[2. No integration issue in Hydrogen.]

In Hydrogen release, I submitted a patch which changed run.sh of the Virt edit to not-loading sample bundles.
https://git.opendaylight.org/gerrit/#/c/3669/

As long as I know, there is no integration issue in Hydrogen.

In hydrogen, the toaster config file does not exist, so the toater error does not occur.

$ cd opendaylight/configuration/initial
$ ls
00-netty.xml 01-md-sal.xml
$

I think a post-hydrogen patch in controller project make the Virt edition inconsistency...

[3. Config Pusher behavior]

In this bug report, I'm talking about controller behavior.

I think many people will try to create their own OpenDaylight Controller edition, so integration issues will happen again and again.
It is not easy to eliminate all integration issues as controller code changes every day.

To limit the impact of integration issues, I think Config Pusher should not propagate an error a config file to other bundles.
I think Config Pusher should skip a error on config file and can continue to push configuration for other bundles.

I thought the current behavior of Config Pusher was not as you intended, so I thought it was a bug.
However, it seems that the current behavior is as you intended, so it is ok to change the status of this report to "RESOLVED".

Comment by Michal Rehak [ 30/May/14 ]

Moving bug to integration as this is clearly issue of integration: filtering out projects without taking care if configSubsystem files.

The same problem seems to bother affinity filter too.

Comment by Hideyuki Tai [ 30/May/14 ]

(In reply to Robert Varga from comment #9)
>
> If you remove a component and retain a configuration part, you have to also
> remove any references to it. If we were to implement the change your
> request, could easily end up with a missing service definition and a system
> which looks as if it's completely up, except it is not.
>
> The ConfigPusher behavior is precisely what it should be as per the 'fail
> fast, fail reliably' design philosophy. These integration problems have to
> be detected, reported and fixed, rather than be papered over.

In Hydrogen release, I changed the Virt edition to not-load a toaster bundle, but the toaster config file did not exist at that time, so I couldn't remove it.

To detect a missing service definition or an inconsistent configuration, I think it is enough to output error messages.
I think stopping to push configuraion for un-related bundles is excessive reaction.

Actually, my first observation of this issue was that the new OF 1.3 plugin did not work at all in the Virt edition.
So I thought there was something wrong in the new OF 1.3 plugin.
However, the issue was not related to the new OF 1.3 plugin.
If Config Pusher did not stop to push configuration for the new OF 1.3 plugin, it would have not made me confused.

Comment by Luis Gomez [ 30/May/14 ]

Not sure if I understand the fix here, but if it is to filter out some bundles for a particular edition, projects can file the patch as they did in the past and I will approve it.

BR/Luis

Comment by Luis Gomez [ 30/May/14 ]

OK, after reading more in detail I think I understand better what is going on here and I am of the opinion that a sample application in Base edition should be easily removable like all AD-SAL samples are now, not just for VTN distribution but for any end user that wants to just use the controller Core components without any add.

But there is also another thing more disturbing here that I was not even aware of: the OSGi modularity. We are saying everybody our platform is totally modular the way we bring up and bring down bundles when we need / do not need specific apps. So having a config file system (or configpusher) that does not allow to do that seems very against what we are selling.

If this behavior cannot be amended for any reason, we should at least let the community know about this limitation as I do not think many (like me) know about this.

BR/Luis

Comment by Jason Norstrand [ 30/May/14 ]

Luis,

I think really what this comes down to is another example of the need to move to Karaf. The Felix DM hack we abused in Hydrogen (while clever), was just a hack... and we found very quickly as we extended it that it was increasingly complex to get right.

We have also started hitting portions of the limitations of the single config.ini file approach. It turns out, lots of modules need a bit of config, and having them all in a single file is a really bad way to handle that in a modular system.

By actually having modular configs, you escape that problem, but you do actually need to manage the modularity of your configuration then as well.

The OFplugin config basically says:

"By the way, you need to wire into the OFplugin two instances of the OFjava protocol stack, one on port 6633 and another on port 6653". You could just as easily wire in additional instances of OFjava on other ports. That kind of configured modularity is key, and can't be done with somple config.ini + Felix DM easily in a modular way. That configuration is also exposed at runtime, so that you can add new ports, remove existing ones, etc... all modularly.

The challenge we are facing is that we are hitting the wall on what our current hacks can support cleanly. We are at a point were we can continue to lair hacks on hacks on hacks (kind of like what we did in a pinch in Hydrogen) or we can graduate to more mature systems (Karaf being one example).

Given that we have 27 projects in Helium, and we need to get along, I don't think that script hacking on Felix DM + a single config.ini is going to scale.

That said, if we are going to hack our way one last time through the Hydrogen Stable... I would suggest we extend the current hack with the scripts so that the choice of flags for openflow selections simply move the config file in or out of place. Its simple, and it will get us past the current blockage for stable hydrogen, but continually hacking startup scripts to handle coexistence is not a good solution for Helium.

Comment by Ed Warnicke [ 30/May/14 ]

Apologies, was logged in as Jason Norstrand... comment above from Jason Norstrand was from Ed Warnicke.

Comment by Ed Warnicke [ 30/May/14 ]

Luis,

One other point of clarification.

The config subsystem allows for multiple modular configs. Each config gets processed separately, in order.

If there is a configuration present in the system, that cannot be applied without the bundle to which it is to be applied, then it fails to be applicable. If we have provided configuration that cannot be applied, that's an error. Otherwise you wind up with a highly indeterminent system, that silently fails in ways that may be quite surprising in production (as the silent Felix DM fail two days ago that started failing the tests did).

Other, independent configs proceeding it continue to be applied (if successful).

The config file for an app can be moved omitted if you don't want it, or moved out of the way by a script if we are using our current startup script hack. It is easily removable, just like every app is currently. The only difference is that if you are claiming to provide configuration for the system, it won't silently and indeterminantly go on without complaining to you about it. In other words, no silent violation of expressed expectations.

Or to put it differently, the only new limitation here is that a certain class of Heisenbugs has been precluded.

Comment by Tony Tkacik [ 30/May/14 ]

I see this bug is IN_PROGRESS, may I ask the person, who is working on it, to take it.
So we will have primary contact for this bugfix.

Comment by Luis Gomez [ 30/May/14 ]

Hi Jason Norstrand (Ed), thanks for the detailed explanation. I think this claims a community discussion on how we want to handle config modularity in Helium. If Karaf is the solution maybe we should wait until Karaf go decision.

Hi Tony, I put the bug in progress because I really wanted to do something here, but I think it is easier if you or Hideyuki prepare the right patch as you know better the config changes that are needed here.

BR/Luis

Comment by Tomas Olvecky [ 02/Jun/14 ]

Sorry I didn't notice the bug changed product/component

Comment by Giovanni Meo [ 03/Jun/14 ]

Hi Tony, Robert and Ed,

i have seen your comments on this bug and i actually believe this bug should be fixed in the config-subsystem, the current behavior ignore a very common case in networking which is pre-provisioning. Here is the scenario:

T0) The admin of controller wants to make sure that a given component if installed is configured such that a bunch of parameters comply with certain deployments guidelines.
T1) A controller starts and apply the configuration for all the components till there, but fails to start because a pre-provisioned component is not yet there. So it will never reach a state T2 where eventually the component could be installed in the system at a later time, maybe after an actual need for it.

This is something that karaf won't be solving either because with karaf you will be packaging together a bundle with its configuration, as an integrator would prepare, while someone deploying the system is only interested to change the configuration of that component without the need to create another karaf feature for it.
Essentially here we are denying the possibility to an admin of deployment to override configurations other than what the integrator is supplying, but what is more fishy is denying only for the components that could be installed later on not for the ones that are present at boot time. This is against the way of operating and OSGi container or any container with plugins for that matter where you can add a plugin later on.

To add to this, the pre-provisioning is today possible with pretty much all the other alternatives to config-subsystem, for whatever simple or non-scalable they are. For example with adding some simple configs to config.ini, you can pre-provision and noone would complain (also because there is nothing to complain, component may just come later), also configAdmin from OSGi specification provide the same operation, and karaf container uses that too. Now given config-subsystem wants to replace configAdmin, it should provide at least the same level of support. The pre-provision feature is also available with the configuration subsystem, used more widely in AD-SAL based components.
Essentially forcing the behavior currently in config-subsystem breaks an important scenario for network operator, and introduce a regression for the components that wants to migrate to config-subsystem.
So please fix in config-susbsystem the pre-provision scenario because is needed in real world scenarios.

Thanks,
Giovanni

Comment by Robert Varga [ 03/Jun/14 ]

I simply cannot agree with the assessment. The way ConfigPusher is designed is that the configuration it sees MUST be applied. If the configuration cannot be applied, that is an error of deployment.

What that really says is the following: the operator wants us to use this configuration, but we cannot apply it. We do not know why we cannot apply it. You are arguing for happily continuing – which would result in the system now knowing when it finished initializing.

As it stands, this issue is an integration problem. If anyone wants to open the design part of this, that has to be a separate enhancement, prioritized and scheduled.

Comment by Giovanni Meo [ 03/Jun/14 ]

(In reply to Robert Varga from comment #22)
> I simply cannot agree with the assessment. The way ConfigPusher is designed
> is that the configuration it sees MUST be applied. If the configuration
> cannot be applied, that is an error of deployment.

How do you plan to pre-provision the configuration for a bundle then?

> What that really says is the following: the operator wants us to use this
> configuration, but we cannot apply it. We do not know why we cannot apply
> it. You are arguing for happily continuing – which would result in the
> system now knowing when it finished initializing.

The bundle may come later on, i have never seen an ubuntu machine not booting because there was a configuration file in the /etc directory for apache even though apache was not installed. And along this line i can bring many other examples in both server and networking world. I don't see where is the difference here.

> As it stands, this issue is an integration problem. If anyone wants to open
> the design part of this, that has to be a separate enhancement, prioritized
> and scheduled.

I disagree with pushing the problem to integration. This is a limitation in config-subsystem.

Comment by Robert Varga [ 03/Jun/14 ]

Ah, but you sill see a nice little fgailure if you attempt to configure something in apache for which you do not have a module. This is exactly what is happening here.

Again, the component works as designed. There are two options: fix integration or change the design. Fixing the integration is a "bug"-level issue. Changing design is "enhancement"-level.

Comment by Giovanni Meo [ 03/Jun/14 ]

(In reply to Robert Varga from comment #24)
> Ah, but you sill see a nice little fgailure if you attempt to configure
> something in apache for which you do not have a module. This is exactly what
> is happening here.
>
> Again, the component works as designed. There are two options: fix
> integration or change the design. Fixing the integration is a "bug"-level
> issue. Changing design is "enhancement"-level.

Robert, that is exactly the difference here. In this case toaster is not there and you are claiming failure for that absence, while in the case you mentioned the equivalent would be "toaster" is there but it failed to configure, this is a legitimate error case. Remember the controller run on OSGi the absence of a bundle is not an exception path, but should be main path.

Also apparently the controller is actually not stopping from running, but only is throwing an alarming error, for nothing. So i think we can do a lot by just adjusting the verboseness of the message emitted and the log level. In particular i would propose that to be ERROR level if some bundle did actually fail to configure, while to be a WARNING if a non-existent bundle didn't configure. Just to capture the difference between the two conditions.

Comment by Giovanni Meo [ 04/Jun/14 ]

(In reply to Jason Norstrand from comment #15)
> Luis,
>
> I think really what this comes down to is another example of the need to
> move to Karaf. The Felix DM hack we abused in Hydrogen (while clever), was
> just a hack... and we found very quickly as we extended it that it was
> increasingly complex to get right.
>
> We have also started hitting portions of the limitations of the single
> config.ini file approach. It turns out, lots of modules need a bit of
> config, and having them all in a single file is a really bad way to handle
> that in a modular system.
>
> By actually having modular configs, you escape that problem, but you do
> actually need to manage the modularity of your configuration then as well.
>
> The OFplugin config basically says:
>
> "By the way, you need to wire into the OFplugin two instances of the OFjava
> protocol stack, one on port 6633 and another on port 6653". You could just
> as easily wire in additional instances of OFjava on other ports. That kind
> of configured modularity is key, and can't be done with somple config.ini +
> Felix DM easily in a modular way. That configuration is also exposed at
> runtime, so that you can add new ports, remove existing ones, etc... all
> modularly.
>
> The challenge we are facing is that we are hitting the wall on what our
> current hacks can support cleanly. We are at a point were we can continue
> to lair hacks on hacks on hacks (kind of like what we did in a pinch in
> Hydrogen) or we can graduate to more mature systems (Karaf being one
> example).
>
> Given that we have 27 projects in Helium, and we need to get along, I don't
> think that script hacking on Felix DM + a single config.ini is going to
> scale.
>
> That said, if we are going to hack our way one last time through the
> Hydrogen Stable... I would suggest we extend the current hack with the
> scripts so that the choice of flags for openflow selections simply move the
> config file in or out of place. Its simple, and it will get us past the
> current blockage for stable hydrogen, but continually hacking startup
> scripts to handle coexistence is not a good solution for Helium.

Jason,

i believe you don't quite understand the role of Felix DM and the role of config.ini
Apache Felix DM is used to track OSGi services, the mechanism is very flexible and modular. Also it allow to use annotated function which we are not using in ODL but nobody prevents to use that instead of the function callback. Still it employes a much more flexible way to describe the dependencies that a yang file, which is rather verbose and don't allow to select services based on properties. Also Felix DM has the capability to track not only OSGi services but also resource presence and you can create even your resource tracker.

The config.ini to customize components is in no-way related to Felix DM, config.ini is a convenient way to set -D parameters to the JVM, something you can do even without OSGi.
OSGi has it's own mechanism to configure components called configAdmin which is very modular in practice and used by Karaf as well, but is not limited to Karaf in fact you can code for configAdmin also in the controller today because the configAdmin service is included.

I'm just providing this information to clarify a bunch of mistake contained in this post, given these comments are recorded is better to cleanup the record with correct information.

Quite frankly the only added value provided by config-subsystem is the two phase commit on the configuration changes, but for the rest is not in par neither with the service tracking provider by apache Felix DM nor with the configuration and pre-provisioning feature provided by other configuration methods available in OSGi.

Thanks,
Giovanni

Comment by Michal Rehak [ 04/Jun/14 ]

Hi all,
could you please move this discussion to mailing lists? I believe there you might get even more opinions and visions.

This bug is not about deciding the future of configSubsystem nor controller. This bug is currently blocking vtn and affinity to use "-of13".

So please, let's fix it by reintroducing consistency between budles and configurations. Whatever the result of discussion will be it wont affect of13 support in vtn or affinity.

Thank you.

Comment by Giovanni Meo [ 04/Jun/14 ]

(In reply to michal rehak from comment #27)
> Hi all,
> could you please move this discussion to mailing lists? I believe there you
> might get even more opinions and visions.
>
> This bug is not about deciding the future of configSubsystem nor controller.
> This bug is currently blocking vtn and affinity to use "-of13".
>
> So please, let's fix it by reintroducing consistency between budles and
> configurations. Whatever the result of discussion will be it wont affect
> of13 support in vtn or affinity.
>
> Thank you.

Michael,

this bug is about fixing an issue in config-subsystem, so seems the correct place to be discussed.

Thanks,
Giovanni

Comment by Luis Gomez [ 04/Jun/14 ]

IMHO, we can do both things in parallel: 1) put a patch in the distributions so we do not block VTN people testing with OF13 plugin, and 2) open a task in controller to improve the config sub-system behavior, specially if this is not a trivial thing to do. Does this make sense?

BR/Luis

Comment by Kamal Rameshan [ 04/Jun/14 ]

Fix to change the bundle filtering in run script, to unblock VTN team from using OF1.3

https://git.opendaylight.org/gerrit/#/c/7698/

Comment by Ed Warnicke [ 05/Jun/14 ]

Should be fixed by:

https://git.opendaylight.org/gerrit/#/c/7703/

Comment by Ed Warnicke [ 05/Jun/14 ]

Filed https://bugs.opendaylight.org/show_bug.cgi?id=1135 for the secondary
issue of the correct handling of the ERROR exception when we have configuration for which the controller lacks the bundles to provide the capabilities to fulfill it.

Comment by Giovanni Meo [ 05/Jun/14 ]

(In reply to Luis Gomez from comment #29)
> IMHO, we can do both things in parallel: 1) put a patch in the distributions
> so we do not block VTN people testing with OF13 plugin, and 2) open a task
> in controller to improve the config sub-system behavior, specially if this
> is not a trivial thing to do. Does this make sense?
>
> BR/Luis

It make sense.

Thanks,
Giovanni

Comment by Tony Tkacik [ 21/Sep/14 ]

Decreased to normal, since new features infrastructure allows customized installations and configuration installing only for installed features.

Comment by Luis Gomez [ 22/Sep/14 ]

This issue should not show anymore with karaf distribution, otherwise please file a new bug in integration.

BR/Luis

Generated at Wed Feb 07 20:02:32 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.