[CONTROLLER-1441] Clustering : OF ports (6653,6633) not coming up in couple of nodes after all required OF bundle installated in 5 node cluster ( seen very frequently 9 of 10 times in SR2) Created: 03/Nov/15 Updated: 18/Feb/16 Resolved: 18/Feb/16 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | Lithium |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Sanjib Mohapatra | Assignee: | Moiz Raja |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
| External issue ID: | 4572 |
| Priority: | Normal |
| Description |
|
Clustering Issue : OF ports (6653,6633) not coming up in couple of nodes after all required OF bundle installed in 5 node cluster ( seen very very frequently 9 outof 10 times in SR2. Please refer below steps. 1. ./deploy_odl.py --clean --distribution=/home/mininet/controller-Li/distribution-karaf-0.3.2-Lithium-SR2.zip --rootdir=/home/mininet/controller-Li --hosts=10.183.181.41,10.183.181.42,10.183.181.43,10.183.181.44,10.183.181.45 --user=root --password=rootroot --template=/multi-node-test 2. root@mininet-vm:~# netstat -na | grep 2550 feature:install odl-clustering-test-app 3. OF ports 6633, 6653 came up in all 4 nodes (c2, c3, c4, c5) excpet c1. c1: c2, c3, c4, c5: 4. checked suspected ERROR logs 2015-11-02 22:13:17,361 | ERROR | config-pusher | ConfigTransactionControllerImpl | 125 - org.opendaylight.controller.config-manager - 0.3.2.Lithium-SR2 | Commit failed on ModuleIdentifier {factoryName='openflow-plugin-provider-impl', instanceName='openflow-plugin-provider-impl'} in transaction TransactionIdentifier{name='ConfigTransaction-86-88'}java.lang.IllegalStateException: java.util.concurrent.ExecutionException: TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]} at org.opendaylight.openflowplugin.impl.device.DeviceManagerImpl.<init>(DeviceManagerImpl.java:155) at org.opendaylight.openflowplugin.impl.OpenFlowPluginProviderImpl.initialize(OpenFlowPluginProviderImpl.java:169) at org.opendaylight.yang.gen.v1.urn.opendaylight.params.xml.ns.yang.config.openflow.plugin.impl.rev150327.OpenFlowProviderModule.createInstance(OpenFlowProviderModule.java:40) at org.opendaylight.controller.config.spi.AbstractModule.getInstance(AbstractModule.java:73) at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)[:1.7.0_65] at java.lang.reflect.Method.invoke(Method.java:606)[:1.7.0_65] at org.opendaylight.controller.config.manager.impl.dependencyresolver.DependencyResolverManager$ModuleInvocationHandler.handleInvocation(DependencyResolverManager.java:150) 5. Giving grep output of all errors in logfile. Also attaching karaf log files. root@mininet-vm:/home/mininet/controller-Li/deploy/current/odl/data/log# grep ERROR karaf.log.1 2015-11-02 22:05:11,708 | ERROR | Event Dispatcher | FeatureDeploymentListener | 21 - org.apache.karaf.deployer.features - 3.0.3 | Unable to install features 2015-11-02 22:10:27,389 | ERROR | config-pusher | DistributedDataStore | 191 - org.opendaylight.controller.sal-distributed-datastore - 1.2.2.Lithium-SR2 | Shared leaders failed to settle in 90 seconds, giving up 2015-11-02 22:11:57,500 | ERROR | config-pusher | DistributedDataStore | 191 - org.opendaylight.controller.sal-distributed-datastore - 1.2.2.Lithium-SR2 | Shared leaders failed to settle in 90 seconds, giving up 2015-11-02 22:12:35,456 | ERROR | CommitFutures-1 | NetconfDeviceDatastoreAdapter | 243 - org.opendaylight.controller.sal-netconf-connector - 1.2.2.Lithium-SR2 | RemoteDevice{controller-config}: TransactionChain(org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter@adef923) DOM-CHAIN-0-0 FAILED! TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]} 2015-11-02 22:12:35,253 | ERROR | CommitFutures-0 | NetconfDeviceDatastoreAdapter | 243 - org.opendaylight.controller.sal-netconf-connector - 1.2.2.Lithium-SR2 | RemoteDevice{controller-config}: Transaction(init) DOM-CHAIN-0-0 FAILED! TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]} 2015-11-02 22:12:36,644 | ERROR | CommitFutures-4 | NetconfDeviceTopologyAdapter | 243 - org.opendaylight.controller.sal-netconf-connector - 1.2.2.Lithium-SR2 | RemoteDevice{controller-config}: Transaction(init) DOM-CHAIN-1-0 FAILED! TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-topology-operational currently has no leader. Try again later.]]} 2015-11-02 22:12:36,449 | ERROR | CommitFutures-3 | NetconfDeviceTopologyAdapter | 243 - org.opendaylight.controller.sal-netconf-connector - 1.2.2.Lithium-SR2 | RemoteDevice{controller-config}: TransactionChain(org.opendaylight.controller.md.sal.binding.impl.BindingDOMTransactionChainAdapter@dd96fd0) DOM-CHAIN-1-0 FAILED! TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-topology-operational currently has no leader. Try again later.]]} java.util.concurrent.ExecutionException: TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-topology-operational currently has no leader. Try again later.]]} Caused by: TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-topology-operational currently has no leader. Try again later.]]} 2015-11-02 22:13:02,207 | ERROR | sing-executor-11 | ExecutionList | 72 - com.google.guava - 18.0.0 | RuntimeException while executing runnable com.google.common.util.concurrent.Futures$6@3d203dd5 with executor INSTANCE 2015-11-02 22:13:17,353 | ERROR | config-pusher | DeviceManagerImpl | 280 - org.opendaylight.openflowplugin.impl - 0.1.2.Lithium-SR2 | Creation of node failed. java.util.concurrent.ExecutionException: TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]} Caused by: TransactionCommitFailedException{message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]} 2015-11-02 22:13:17,361 | ERROR | config-pusher | ConfigTransactionControllerImpl | 125 - org.opendaylight.controller.config-manager - 0.3.2.Lithium-SR2 | Commit failed on ModuleIdentifier{factoryName='openflow-plugin-provider-impl', instanceName='openflow-plugin-provider-impl'} in transaction TransactionIdentifier {name='ConfigTransaction-86-88'}java.lang.IllegalStateException: java.util.concurrent.ExecutionException: TransactionCommitFailedException {message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]}Caused by: java.util.concurrent.ExecutionException: TransactionCommitFailedException {message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]}Caused by: TransactionCommitFailedException {message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]}2015-11-02 22:13:17,464 | ERROR | config-pusher | ConfigRegistryImpl | 125 - org.opendaylight.controller.config-manager - 0.3.2.Lithium-SR2 | Configuration Transaction failed on 2PC, server is unhealthy Caused by: java.util.concurrent.ExecutionException: TransactionCommitFailedException {message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]}Caused by: TransactionCommitFailedException {message=canCommit encountered an unexpected failure, errorList=[RpcError [message=canCommit encountered an unexpected failure, severity=ERROR, errorType=APPLICATION, tag=operation-failed, applicationTag=null, info=null, cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: Shard member-1-shard-inventory-operational currently has no leader. Try again later.]]}root@mininet-vm:/home/mininet/controller-Li/deploy/current/odl/data/log# grep ERROR karaf.log |
| Comments |
| Comment by Sanjib Mohapatra [ 03/Nov/15 ] |
|
Attachment c1.karaf.log.rar has been added with description: karaf logs of controller1 (c1) |
| Comment by Tom Pantelis [ 18/Feb/16 ] |
|
This appears to be an issue with OF, specifically the "lithium" plugin. The module seems to fail fast if it can't write to the datastore on startup. If the issue still exists post Li SR2, please open a bug in the OF project. |