[CONTROLLER-1759] IncompatibleClassChangeError in RPC test Created: 25/Aug/17 Updated: 12/Sep/17 Resolved: 12/Sep/17 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | Nitrogen |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Vratko Polak | Assignee: | Vratko Polak |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
| External issue ID: | 9057 |
| Description |
|
This is a symptom of failing CSIT for 3 nodes, so I guess it is a clustering bug, but it might be something else (we do not have corresponding 1 node test). This happens on current Nitrogen builds. As the test is stable and passing on Carbon, this is a regression, so marking as a blocker. Here [0] is the robot failure, but there are other similar failures in other suites of the same job as well. Copy of the exception: Sandbox shows [1] the same error happens even without preceding reset suite, so it is unlikely to be caused just by ODL booting up slowly. But I have not looked very deeply, perhaps the bootup process did get stuck somewhere. Karaf.log [2] does not show much detail even on DEBUG level (but I am not sure whether the "lagging entries" lines are relevant): [0] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-nitrogen/124/log.html.gz#s1-s2-t1-k2-k1-k1-k1-k1-k3-k1-k4-k7-k1 |
| Comments |
| Comment by Kit Lou [ 29/Aug/17 ] |
|
We should verify with the next Nitrogen RC build as many carbon related patches have been merged. |
| Comment by Tom Pantelis [ 31/Aug/17 ] |
|
IncompatibleClassChangeError indicates a binary incompatibility or maybe a class loader issue, ie perhaps bundle(s) are being refreshed/restarted. This is not a clustering issue. Unfortunately the link at [2] no longer exists. In the future it would be better to attach log files to the bug (assuming they're not too big). |
| Comment by Vratko Polak [ 06/Sep/17 ] |
|
The controller clustering job is not seeing this symptom for some time now. Marking as fixed. |
| Comment by Vratko Polak [ 06/Sep/17 ] |
|
Oops sorry, I was looking at wring job. The IncompatibleClassChangeError [3] is still there. |
| Comment by Vratko Polak [ 06/Sep/17 ] |
|
> perhaps bundle(s) are being refreshed/restarted No restarts, and there are only two lines for refreshes, considered benign so far: 2017-09-06 03:35:32,175 | INFO | pool-2-thread-1 | FeaturesServiceImpl | 6 - org.apache.karaf.features.core - 4.0.9 | org.apache.servicemix.bundles.jasypt/1.9.2.1 (Should be wired to: javax.servlet-api/3.1.0 (through [org.apache.servicemix.bundles.jasypt/1.9.2.1] osgi.wiring.package; filter:="(osgi.wiring.package=javax.servlet)"; resolution:=optional)) |
| Comment by Vratko Polak [ 07/Sep/17 ] |
|
I have run a single test on Sandbox with a lot of debugs, but I do not see much relevant in the karaf log [4]. Perhaps something is wring with blueprint initialization? Here is the part just before the failure: 2017-09-07 10:08:15,360 | DEBUG | egister-constant | HttpInput | 147 - org.eclipse.jetty.util - 9.2.19.v20160908 | HttpInputOverHTTP@71ab47b8 eof EOF 2017-09-07 10:08:15,361 | DEBUG | egister-constant | ServiceRecipe | 71 - org.apache.aries.blueprint.core - 1.7.1 | Method entry: getService, args org.opendaylight.controller.md.sal.dom.broker.impl.DOMRpcRouter@3f26a886 2017-09-07 10:08:15,362 | DEBUG | egister-constant | ProxySubclassGenerator | 79 - org.apache.aries.proxy.impl - 1.0.5 | Method entry: isProxySubclass, args [class org.opendaylight.controller.md.sal.dom.broker.impl.DOMRpcRouter] 2017-09-07 10:08:15,362 | DEBUG | egister-constant | ProxySubclassGenerator | 79 - org.apache.aries.proxy.impl - 1.0.5 | Method exit: isProxySubclass, returning false 2017-09-07 10:08:15,362 | DEBUG | egister-constant | ServiceRecipe | 71 - org.apache.aries.blueprint.core - 1.7.1 | Method exit: getService, returning org.opendaylight.controller.md.sal.dom.broker.impl.DOMRpcRouter@3f26a886 2017-09-07 10:08:15,367 | DEBUG | egister-constant | MdsalLowLevelTestProvider | 214 - org.opendaylight.controller.samples.clustering-it-provider - 1.6.0.SNAPSHOT | Received register-constant rpc, input: RegisterConstantInput{getConstant=constant-1, augmentations={}} 2017-09-07 10:08:15,368 | DEBUG | egister-constant | GetConstantService | 214 - org.opendaylight.controller.samples.clustering-it-provider - 1.6.0.SNAPSHOT | Registering get-constant service, constant value: constant-1 2017-09-07 10:08:15,368 | DEBUG | egister-constant | ServiceRecipe | 71 - org.apache.aries.blueprint.core - 1.7.1 | Retrieving service for bundle org.opendaylight.controller.samples.clustering-it-provider_1.6.0.SNAPSHOT [214] and service registration {org.opendaylight.controller.md.sal.dom.api.DOMRpcService, org.opendaylight.controller.md.sal.dom.api.DOMRpcProviderService} = {osgi.service.blueprint.compname=domRpcRouter, type=default, service.id=204, service.bundleid=197, service.scope=bundle}2017-09-07 10:08:15,368 | DEBUG | egister-constant | ServiceRecipe | 71 - org.apache.aries.blueprint.core - 1.7.1 | Method entry: getService, args org.opendaylight.controller.md.sal.dom.broker.impl.DOMRpcRouter@3f26a886 |
| Comment by Vratko Polak [ 07/Sep/17 ] |
|
> it would be better to attach log files to the bug XZ compression was enough for [4], attached. |
| Comment by Vratko Polak [ 07/Sep/17 ] |
|
Attachment 2017-09-07_odl1_karaf.log.xz has been added with description: XZ compressed karaf log |
| Comment by Vratko Polak [ 07/Sep/17 ] |
|
Currently, I think this is related to a recent change [5] which moves registerRpcImplementation method to be a default method of DOMRpcProviderService interface. For some reason, clustering-test-app when built is picking a codebase where registerRpcImplementation is still a method of DOMRpcRouter class. Partial revert [6] if the said change fixes [7] this Bug on Sandbox. I have verified that [5] https://git.opendaylight.org/gerrit/#/c/51035/11/opendaylight/md-sal/sal-dom-api/src/main/java/org/opendaylight/controller/md/sal/dom/api/DOMRpcProviderService.java |
| Comment by Vratko Polak [ 07/Sep/17 ] |
|
> if a pom improvement [8] is a fix. It is not. > Partial revert [6] Sorry, it is a full revert (now patch set 2). Currently my candidate for the fix, waiting for review. |
| Comment by Vratko Polak [ 08/Sep/17 ] |
|
I have found another consequence of this Bug. Singleton service suites tend to crash JVM. Robot symptom is [9] "Connection reset by peer", the cause is printed to stderr [10] identifying it as this Bug: Uncaught error from thread [opendaylight-cluster-data-akka.actor.default-dispatcher-6] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[opendaylight-cluster-data] The fix [6] makes the suites pass [11] again. [9] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-all-nitrogen/141/log.html.gz#s1-s10-t1-k3-k1-k3-k1-k3-k1-k3-k1 |
| Comment by Kit Lou [ 08/Sep/17 ] |
|
Vratko, Does this mean merging [6] will address this blocker? Thanks! |
| Comment by Vratko Polak [ 12/Sep/17 ] |
|
> merging [6] will address this blocker? Yes, and it is merged now. |