[CONTROLLER-1441] Clustering : OF ports (6653,6633) not coming up in couple of nodes after all required OF bundle installated in 5 node cluster ( seen very frequently 9 of 10 times in SR2) Created: 03/Nov/15  Updated: 18/Feb/16  Resolved: 18/Feb/16

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: Lithium
Fix Version/s: None

Type: Bug
Reporter: Sanjib Mohapatra Assignee: Moiz Raja
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: File c1.karaf.log.rar    
External issue ID: 4572
Priority: Normal

 Description   

Clustering Issue : OF ports (6653,6633) not coming up in couple of nodes after all required OF bundle installed in 5 node cluster ( seen very very frequently 9 outof 10 times in SR2. Please refer below steps.

1.
Using SR2 build, Starting a 5 node cluster ( c1, c2, c3, c4 & c5) using deploy script as below.

./deploy_odl.py --clean --distribution=/home/mininet/controller-Li/distribution-karaf-0.3.2-Lithium-SR2.zip --rootdir=/home/mininet/controller-Li --hosts=10.183.181.41,10.183.181.42,10.183.181.43,10.183.181.44,10.183.181.45 --user=root --password=rootroot --template=/multi-node-test

2.
Installed following bundles in all nodes one after another, after ensuring (odl-cluster-data & odl-cluster-rpc Ports) 2550 & 2551 are up.

root@mininet-vm:~# netstat -na | grep 2550
tcp6 0 0 10.183.181.41:2550 :::* LISTEN
tcp6 0 0 10.183.181.41:2550 10.183.181.45:59936 ESTABLISHED
tcp6 0 0 10.183.181.41:38763 10.183.181.42:2550 ESTABLISHED
tcp6 0 0 10.183.181.41:2550 10.183.181.43:60935 ESTABLISHED
tcp6 0 0 10.183.181.41:32937 10.183.181.43:2550 ESTABLISHED
tcp6 0 0 10.183.181.41:47800 10.183.181.45:2550 ESTABLISHED
tcp6 0 0 10.183.181.41:54118 10.183.181.44:2550 ESTABLISHED
tcp6 0 0 10.183.181.41:2550 10.183.181.44:34235 ESTABLISHED
tcp6 0 0 10.183.181.41:2550 10.183.181.42:51515 ESTABLISHED
root@mininet-vm:~# netstat -na | grep 2551
tcp6 0 0 10.183.181.41:2551 :::* LISTEN
tcp6 0 0 10.183.181.41:2551 10.183.181.42:33333 ESTABLISHED
tcp6 0 0 10.183.181.41:2551 10.183.181.44:36155 ESTABLISHED
tcp6 0 0 10.183.181.41:2551 10.183.181.43:49365 ESTABLISHED
tcp6 0 0 10.183.181.41:35548 10.183.181.42:2551 ESTABLISHED
tcp6 0 0 10.183.181.41:58923 10.183.181.44:2551 ESTABLISHED
tcp6 0 0 10.183.181.41:39818 10.183.181.45:2551 ESTABLISHED
tcp6 0 373 10.183.181.41:42423 10.183.181.43:2551 ESTABLISHED
tcp6 0 0 10.183.181.41:2551 10.183.181.45:40387 ESTABLISHED

feature:install odl-clustering-test-app
feature:install odl-restconf-all
feature:install odl-mdsal-all
feature:install odl-openflowplugin-all-li

3. OF ports 6633, 6653 came up in all 4 nodes (c2, c3, c4, c5) excpet c1.

c1:
===
root@mininet-vm:~# netstat -na | grep 665*
tcp 0 0 0.0.0.0:6644 0.0.0.0:* LISTEN
unix 3 [ ] STREAM CONNECTED 9662
unix 3 [ ] STREAM CONNECTED 1866
unix 3 [ ] STREAM CONNECTED 8566 @/com/ubuntu/upstart

c2, c3, c4, c5:
===============
root@mininet-vm:~# netstat -na | grep 665*
tcp 0 0 0.0.0.0:6644 0.0.0.0:* LISTEN
tcp6 0 0 :::6633 :::* LISTEN
tcp6 0 0 :::6653 :::* LISTEN

4. checked suspected ERROR logs

2015-11-02 22:13:17,361 | ERROR | config-pusher | ConfigTransactionControllerImpl | 125 - org.opendaylight.controller.config-manager - 0.3.2.Lithium-SR2 | Commit failed on ModuleIdentifier

{factoryName='openflow-plugin-provider-impl', instanceName='openflow-plugin-provider-impl'} in transaction TransactionIdentifier{name='ConfigTransaction-86-88'}
java.lang.IllegalStateException: java.util.concurrent.ExecutionException: TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]}
at org.opendaylight.openflowplugin.impl.device.DeviceManagerImpl.<init>(DeviceManagerImpl.java:155)
at org.opendaylight.openflowplugin.impl.OpenFlowPluginProviderImpl.initialize(OpenFlowPluginProviderImpl.java:169)
at org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.config.openflow.plugin.impl.rev150327.OpenFlowProviderModule.createInstance(OpenFlowProviderModule.java:40)
at org.opendaylight.controller.config.spi.AbstractModule.getInstance(AbstractModule.java:73)
at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)[:1.7.0_65]
at java.lang.reflect.Method.invoke(Method.java:606)[:1.7.0_65]
at org.opendaylight.controller.config.manager.impl.dependencyresolver.DependencyResolverManager$ModuleInvocationHandler.handleInvocation(DependencyResolverManager.java:150)

5. Giving grep output of all errors in logfile. Also attaching karaf log files.

root@mininet-vm:/home/mininet/controller-Li/deploy/current/odl/data/log# grep ERROR karaf.log.1
2015-11-02 22:05:11,708 | ERROR | Event Dispatcher | FeatureDeploymentListener | 21 - org.apache.karaf.deployer.features - 3.0.3 | Unable to install features
2015-11-02 22:10:27,389 | ERROR | config-pusher | DistributedDataStore | 191 - org.opendaylight.controller.sal-distributed-datastore - 1.2.2.Lithium-SR2 | Shared leaders failed to settle in 90 seconds, giving up
2015-11-02 22:11:57,500 | ERROR | config-pusher | DistributedDataStore | 191 - org.opendaylight.controller.sal-distributed-datastore - 1.2.2.Lithium-SR2 | Shared leaders failed to settle in 90 seconds, giving up
2015-11-02 22:12:35,456 | ERROR | CommitFutures-1 | NetconfDeviceDatastoreAdapter | 243 - org.opendaylight.controller.sal-netconf-connector - 1.2.2.Lithium-SR2 | RemoteDevice{controller-config}: TransactionChain(org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter@adef923) DOM-CHAIN-0-0 FAILED!
TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]}
2015-11-02 22:12:35,253 | ERROR | CommitFutures-0 | NetconfDeviceDatastoreAdapter | 243 - org.opendaylight.controller.sal-netconf-connector - 1.2.2.Lithium-SR2 | RemoteDevice{controller-config}: Transaction(init) DOM-CHAIN-0-0 FAILED!
TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]}
2015-11-02 22:12:36,644 | ERROR | CommitFutures-4 | NetconfDeviceTopologyAdapter | 243 - org.opendaylight.controller.sal-netconf-connector - 1.2.2.Lithium-SR2 | RemoteDevice{controller-config}: Transaction(init) DOM-CHAIN-1-0 FAILED!
TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-topology-operational currently has no leader. Try again later.]]}
2015-11-02 22:12:36,449 | ERROR | CommitFutures-3 | NetconfDeviceTopologyAdapter | 243 - org.opendaylight.controller.sal-netconf-connector - 1.2.2.Lithium-SR2 | RemoteDevice{controller-config}: TransactionChain(org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter@dd96fd0) DOM-CHAIN-1-0 FAILED!
TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-topology-operational currently has no leader. Try again later.]]}
java.util.concurrent.ExecutionException: TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-topology-operational currently has no leader. Try again later.]]}
Caused by: TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-topology-operational currently has no leader. Try again later.]]}
2015-11-02 22:13:02,207 | ERROR | sing-executor-11 | ExecutionList | 72 - com.google.guava - 18.0.0 | RuntimeException while executing runnable com.google.common.util.concurrent.Futures$6@3d203dd5 with executor INSTANCE
2015-11-02 22:13:17,353 | ERROR | config-pusher | DeviceManagerImpl | 280 - org.opendaylight.openflowplugin.impl - 0.1.2.Lithium-SR2 | Creation of node failed.
java.util.concurrent.ExecutionException: TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]}
Caused by: TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]}
2015-11-02 22:13:17,361 | ERROR | config-pusher | ConfigTransactionControllerImpl | 125 - org.opendaylight.controller.config-manager - 0.3.2.Lithium-SR2 | Commit failed on ModuleIdentifier{factoryName='openflow-plugin-provider-impl', instanceName='openflow-plugin-provider-impl'}

in transaction TransactionIdentifier

{name='ConfigTransaction-86-88'}

java.lang.IllegalStateException: java.util.concurrent.ExecutionException: TransactionCommitFailedException

{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]}

Caused by: java.util.concurrent.ExecutionException: TransactionCommitFailedException

{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]}

Caused by: TransactionCommitFailedException

{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]}

2015-11-02 22:13:17,464 | ERROR | config-pusher | ConfigRegistryImpl | 125 - org.opendaylight.controller.config-manager - 0.3.2.Lithium-SR2 | Configuration Transaction failed on 2PC, server is unhealthy
Caused by: java.lang.IllegalStateException: java.util.concurrent.ExecutionException: TransactionCommitFailedException

{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]}

Caused by: java.util.concurrent.ExecutionException: TransactionCommitFailedException

{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]}

Caused by: TransactionCommitFailedException

{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]}

root@mininet-vm:/home/mininet/controller-Li/deploy/current/odl/data/log# grep ERROR karaf.log



 Comments   
Comment by Sanjib Mohapatra [ 03/Nov/15 ]

Attachment c1.karaf.log.rar has been added with description: karaf logs of controller1 (c1)

Comment by Tom Pantelis [ 18/Feb/16 ]

This appears to be an issue with OF, specifically the "lithium" plugin. The module seems to fail fast if it can't write to the datastore on startup. If the issue still exists post Li SR2, please open a bug in the OF project.

Generated at Wed Feb 07 19:55:33 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.