[CONTROLLER-1078] Clustering : 3 Node Clustering of ODL is not working Created: 19/Dec/14  Updated: 19/Oct/17  Resolved: 03/Jun/15

Status: Resolved
Project: controller
Component/s: mdsal
Affects Version/s: Post-Helium
Fix Version/s: None

Type: Bug
Reporter: Bhanu Prakash Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: Linux
Platform: PC


Attachments: Zip Archive akka-Moduleshardsconf.zip    
External issue ID: 2527

 Description   

HI,
am trying out the clustering of ODL, as mentioned in the
https://wiki.opendaylight.org/view/Running_and_testing_an_OpenDaylight_Cluster
i have an issue here.
After i configure the 2 nodes (according to the WiKi) and started the controllers with karaf command, i have the following issues:
1. When i try to access the dlux UI (of the controller IP, which we gave in akka.conf), it says Unable to login.
2. On the node controller, i have the following error message (not able to connect to the controller):

opendaylight-user@root>Exception in thread "config-pusher" java.lang.IllegalStateException: Failed to send commit for configuration 05-clustering.xml(odl-mdsal-clustering,odl-mdsal-clustering)
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.sendRequestGetResponseCheckIsOK(ConfigPusherImpl.java:290)
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.pushConfig(ConfigPusherImpl.java:244)
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.pushConfigWithConflictingVersionRetries(ConfigPusherImpl.java:133)
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.internalPushConfigs(ConfigPusherImpl.java:109)
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.process(ConfigPusherImpl.java:76)
at org.opendaylight.controller.netconf.persist.impl.osgi.ConfigPersisterActivator$InnerCustomizer$1.run(ConfigPersisterActivator.java:178)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Error - getInstance() failed for ModuleIdentifier

{factoryName='binding-broker-impl', instanceName='binding-broker-impl'}

in transaction TransactionIdentifier

{name='ConfigTransaction-39-41'}

at org.opendaylight.controller.config.manager.impl.ConfigTransactionControllerImpl.secondPhaseCommit(ConfigTransactionControllerImpl.java:402)
at org.opendaylight.controller.config.manager.impl.ConfigRegistryImpl.secondPhaseCommit(ConfigRegistryImpl.java:282)
at org.opendaylight.controller.config.manager.impl.ConfigRegistryImpl.commitConfig(ConfigRegistryImpl.java:229)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:193)
at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:175)
at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:117)
at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:54)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at com.sun.jmx.mbeanserver.MXBeanProxy$InvokeHandler.invoke(MXBeanProxy.java:150)
at com.sun.jmx.mbeanserver.MXBeanProxy.invoke(MXBeanProxy.java:167)
at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:252)
at com.sun.proxy.$Proxy13.commitConfig(Unknown Source)
at org.opendaylight.controller.config.util.ConfigRegistryJMXClient.commitConfig(ConfigRegistryJMXClient.java:102)
at org.opendaylight.controller.netconf.confignetconfconnector.transactions.TransactionProvider.commitTransaction(TransactionProvider.java:110)
at org.opendaylight.controller.netconf.confignetconfconnector.operations.Commit.handleWithNoSubsequentOperations(Commit.java:55)
at org.opendaylight.controller.netconf.util.mapping.AbstractLastNetconfOperation.handle(AbstractLastNetconfOperation.java:33)
at org.opendaylight.controller.netconf.util.mapping.AbstractNetconfOperation.handle(AbstractNetconfOperation.java:105)
at org.opendaylight.controller.netconf.persist.impl.ConfigPusherImpl.sendRequestGetResponseCheckIsOK(ConfigPusherImpl.java:285)
... 6 more
Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: /192.168.31.190:2550
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Success.map(Try.scala:206)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.net.BindException: Cannot assign requested address
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:372)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:296)
at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

3. When i try to make a REST call for GET on the member-1 (which is a Leader controller), it returns with Leader=null and RaftState as Candidate.

http://192.168.31.190:8181/jolokia/read/org.opendaylight.controller:Category=Shards,name=member-1-shard-inventory-config,type=DistributedConfigDatastore

{"timestamp":1418995610,"status":200,"request":

{"mbean":"org.opendaylight.controller:Category=Shards,name=member-1-shard-inventory-config,type=DistributedConfigDatastore","type":"read"}

,"value":{"ReadWriteTransactionCount":0,"LastLogIndex":-1,"MaxNotificationMgrListenerQueueSize":1000,"ReadOnlyTransactionCount":0,"LastLogTerm":-1,"CommitIndex":-1,"CurrentTerm":3255,"FailedReadTransactionsCount":0,"Leader":null,"ShardName":"member-1-shard-inventory-config","FailedTransactionsCount":0,"DataStoreExecutorStats":null,"CommittedTransactionsCount":0,"NotificationMgrExecutorStats":

{"activeThreadCount":0,"largestQueueSize":0,"currentThreadPoolSize":0,"maxThreadPoolSize":20,"totalTaskCount":0,"largestThreadPoolSize":0,"currentQueueSize":0,"completedTaskCount":0,"rejectedTaskCount":0,"maxQueueSize":1000}

,"AbortTransactionsCount":0,"LastApplied":-1,"LastCommittedTransactionTime":"1970-01-01 05:30:00.000","WriteOnlyTransactionCount":0,"RaftState":"Candidate","CurrentNotificationMgrListenerQueueStats":[]}}

Please help me



 Comments   
Comment by Bhanu Prakash [ 19/Dec/14 ]

am using this distribution:
distribution-karaf-0.2.1-Helium-SR1

Comment by Bhanu Prakash [ 23/Dec/14 ]

Attachment akka-Moduleshardsconf.zip has been added with description: conf files

Comment by Bhanu Prakash [ 23/Dec/14 ]

attached the conf files used for all the 3 nodes

Comment by Luis Gomez [ 24/Dec/14 ]

Hi Bhanu

I guess this is for cluster experts in controller project. Let me change the product in bugzilla.

BR/Luis

Comment by Vratko Polak [ 07/Jan/15 ]

Hi Bhanu.

Looking at your config files I can pattern-match this:

> Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: /192.168.31.190:2550

Each akka-member*.conf file has two lines starting with
hostname =
You should have an IP address local to the corresponding node there, but currently there is "192.168.31.190" regardless of node.

Comment by Bhanu Prakash [ 07/Jan/15 ]

(In reply to Vratko Polák from comment #5)
> Hi Bhanu.
>
> Looking at your config files I can pattern-match this:
>
> > Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: /192.168.31.190:2550
>
> Each akka-member*.conf file has two lines starting with
> hostname =
> You should have an IP address local to the corresponding node there, but
> currently there is "192.168.31.190" regardless of node.

Hello Vratko Polak,
thanks for the update, i was under impression that there will be only one controller actively running(serving the requests from ovs) in a cluster, and this hostname should point to that IPaddress in all the cluster nodes config.
I'll make the change and see. Please let me know if the other config look good?
can you let me know how to make one particular node a leader for one particular data shard?

Comment by Vratko Polak [ 07/Jan/15 ]

Hi Bhanu.

> that there will be only one
> controller actively running(serving the requests from ovs) in a cluster, and
> this hostname should point to that IPaddress in all the cluster nodes config.

In cluster, all nodes should be able to serve requests from OVS.

> can you let me know how to make one particular node a leader for one
> particular data shard?

User cannot pinpoint which node is Leader, as far as I know.
https://wiki.opendaylight.org/view/OpenDaylight_Controller:MD-SAL:Architecture:Clustering#Replication

If all devices are configured to connect to the same ODL node,
there may be performance benefit, at the expense of availability
(if the node crashes).
But that relates to separate configuration of southbound plugins,
nothing to do with akka hostname (where perhaps even "localhost" would work).

> Please let me know if the other config look good?

Nothing else looked wrong to my fallible eyes.

Vratko.

Comment by Bhanu Prakash [ 08/Jan/15 ]

(In reply to Vratko Polák from comment #7)
> Hi Bhanu.
>
> > that there will be only one
> > controller actively running(serving the requests from ovs) in a cluster, and
> > this hostname should point to that IPaddress in all the cluster nodes config.
>
> In cluster, all nodes should be able to serve requests from OVS.
>
> > can you let me know how to make one particular node a leader for one
> > particular data shard?
>
> User cannot pinpoint which node is Leader, as far as I know.
> https://wiki.opendaylight.org/view/OpenDaylight_Controller:MD-SAL:
> Architecture:Clustering#Replication
>
> If all devices are configured to connect to the same ODL node,
> there may be performance benefit, at the expense of availability
> (if the node crashes).
> But that relates to separate configuration of southbound plugins,
> nothing to do with akka hostname (where perhaps even "localhost" would work).
>
> > Please let me know if the other config look good?
>
> Nothing else looked wrong to my fallible eyes.
>
> Vratko.

Hello Vratko,
Thanks for your answers. I have configured the cluster accordingly, and am able to see the things working expectedly. I have one more observation.
In the 3 nodes of a cluster, i can see only one node can be accessed for the dlux UI. Is this true?
and how do we determine which node dlux ui can be accessed?

Comment by Bhanu Prakash [ 08/Jan/15 ]

incase of data persistence enabled (in this by default), where does the data sit in the file system?
any way to configure this?

Comment by Vratko Polak [ 08/Jan/15 ]

Hi Bhanu.

> In the 3 nodes of a cluster, i can see only one node can be accessed for the
> dlux UI. Is this true?
> and how do we determine which node dlux ui can be accessed?

I have next to zero experience with dlux, but from https://wiki.opendaylight.org/view/OpenDaylight_dlux:Setup_and_Run#Run_the_DLUX_code_against_the_controller
I guess that one dlux can talk to one ODL node only.
"Based on where you controller is running, Update baseUrl in file dlux/dlux-web/config/development.json"

> incase of data persistence enabled (in this by default),
> where does the data sit in the file system?

At /journal and /snapshot directories (under ${karaf.home}).

> any way to configure this?

It seems to be the akka default, I have not found a way to override it
(aside of using symlinks).

Vratko.

Comment by Bhanu Prakash [ 08/Jan/15 ]

(In reply to Vratko Polák from comment #10)
> Hi Bhanu.
>
> > In the 3 nodes of a cluster, i can see only one node can be accessed for the
> > dlux UI. Is this true?
> > and how do we determine which node dlux ui can be accessed?
>
> I have next to zero experience with dlux, but from
> https://wiki.opendaylight.org/view/OpenDaylight_dlux:
> Setup_and_Run#Run_the_DLUX_code_against_the_controller
> I guess that one dlux can talk to one ODL node only.
> "Based on where you controller is running, Update baseUrl in file
> dlux/dlux-web/config/development.json"
>
> > incase of data persistence enabled (in this by default),
> > where does the data sit in the file system?
>
> At /journal and /snapshot directories (under ${karaf.home}).
>
> > any way to configure this?
>
> It seems to be the akka default, I have not found a way to override it
> (aside of using symlinks).
>
> Vratko.

Hello Vratko,
I see the following behavior in the 3 node cluster.
All 3 members are up and running.
I have started the mininet and connected to the member1 (mininet exists in the same node)
sudo mn --switch ovsk,protocols=OpenFlow13 --controller=remote --topo tree,depth=3
I can only access the dlux UI of member1 192.168.31.190, and am able to login and see the topology created by mininet in the ODL dlux UI.
BUT am not able to login (but login screen is visible, bu says "unable to login" on try) to the DLUX UI of other 2 members of the cluster (member2 and member 3). My expectation is to be able to login to the other members ODL dlux UI, and see the same topology (as the replication of data shards should have been done) as in member 1.
Pls correct me if am wrong

Comment by Vratko Polak [ 08/Jan/15 ]

(In reply to Bhanu Prakash from comment #11)
> BUT am not able to login (but login screen is visible, bu says "unable to
> login" on try) to the DLUX UI of other 2 members of the cluster (member2 and
> member 3). My expectation is to be able to login to the other members ODL
> dlux UI, and see the same topology (as the replication of data shards should
> have been done) as in member 1.
> Pls correct me if am wrong

Can you describe what do REST responses look like now?
curl -v -u 'admin:admin' localhost:8181/restconf/operational/network-topology:network-topology
(on each machine)
Depending on that, this may turn out to be a different bug, on dlux side.
If that is the case, I recommend to close this bug and open a new one,
so that old comments do not obscure current issue.

Generated at Wed Feb 07 19:54:38 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.