[CONTROLLER-1613] Clustering: Member fails to re-start sometimes in csit -all- jobs Created: 27/Feb/17 Updated: 25/Jul/23 Resolved: 10/Apr/17 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Vratko Polak | Assignee: | Andrej Mak |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| External issue ID: | 7858 |
| Description |
|
The first symptom detected by Robot suites is constant 404 on jolokia URL. This Carbon bug does not happen in Karaf.log for this Bug contains reports on runtime-generated-mapping not finishing, followed by "giving up" from AbstractDataStore: 2017-02-26 04:02:37,092 | WARN | saction-32-34'}} | DeadlockMonitor | 131 - org.opendaylight.controller.config-manager - 0.6.0.SNAPSHOT | ModuleIdentifier {factoryName='runtime-generated-mapping', instanceName='runtime-mapping-singleton'} did not finish after 169982 ms Possibly this is just a performance Bug (startup taking longer than expected), but when the restart succeeds [1], the instance is created in around 10 seconds. Most probably some ODL project tends to make WaitingServiceTracker not find BindingToNormalizedNodeCodec. [0] https://jenkins.opendaylight.org/releng/view/controller/job/controller-csit-3node-clustering-all-carbon/221/ |
| Comments |
| Comment by Robert Varga [ 27/Mar/17 ] |
|
Is this still present? |
| Comment by Vratko Polak [ 28/Mar/17 ] |
|
> Is this still present? Yes, this still affects around half of netconf [2] runs. Both 404 in robot [3] and "giving up" in karaf.log [4] are there. [2] https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-3node-clustering-all-carbon/ |
| Comment by Tomas Cere [ 31/Mar/17 ] |
|
Seems like the underlying cause is that the rejoining node could not rejoin the cluster 2017-03-27 08:54:01,343 | WARN | ult-dispatcher-5 | JoinSeedNodeProcess | 155 - com.typesafe.akka.slf4j - 2.4.17 | Couldn't join seed nodes after [15] attmpts, will try again. seed-nodes=[akka.tcp://opendaylight-cluster-data@10.29.12.12:2550, akka.tcp://opendaylight-cluster-data@10.29.13.54:2550] these are all over the logs which explains the shard not being able to elect a leader. Now why cant the node rejoin? is it possible that there is an environment issue or an issue with the rejoin script? |
| Comment by Vratko Polak [ 04/Apr/17 ] |
|
> is it possible that there is an environment issue or an issue with the rejoin script? Unlikely. The suite is the same as in [5] https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-3node-clustering-only-carbon |
| Comment by Andrej Mak [ 10/Apr/17 ] |
|
Last runs of https://jenkins.opendaylight.org/releng/view/netconf/job/netconf-csit-3node-clustering-all-carbon/ passed, so issue seems to be resolved for now. |