[CONTROLLER-1694] unsubscribe-dtcl fails with NullPointerException when UnreachableMember happens Created: 22/May/17  Updated: 25/Jul/23  Resolved: 29/May/17

Status: Resolved
Project: controller
Component/s: clustering
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Vratko Polak Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


External issue ID: 8525

 Description   

This happens in test with module-based shard, tell-based protocol. A listener is registered (on a member which is not leader). There is a transaction writer on each member, each finishes successfully [0]. The unscubscribe-dtcl call fails [1]:
Problem accessing /restconf/operations/odl-mdsal-lowlevel-control:unsubscribe-dtcl. Reason:
<pre> Server Error</pre></p><h3>Caused by:</h3><pre>java.lang.NullPointerException
at org.opendaylight.controller.clustering.it.provider.impl.IdIntsListener.checkEqual(IdIntsListener.java:57)
at org.opendaylight.controller.clustering.it.provider.MdsalLowLevelTestProvider.unsubscribeDtcl(MdsalLowLevelTestProvider.java:480)

If the listener got zero notifications, the unsubscribe-dtcl should report that in a readable message. The clustering bug is in the fact the DTC notifications were lost (or the NPE happened for some other reason).

Be the way, the UnreachableMember seems to happen in every test with 3 writers/producers (with half million integer array created in the initial transaction).

[0] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/715/archives/log.html.gz#s1-s36-t3-k2-k15
[1] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/715/archives/log.html.gz#s1-s36-t3-k2-k17-k2-k1-k4-k7-k1



 Comments   
Comment by Vratko Polak [ 22/May/17 ]

> module-based shard, tell-based protocol

For prefix-based shards, unsubscribe-ddtl simply does not respond [2] within 30 seconds.
As usual, no listener-related message is seen in karaf.log [3] between 02:07:15,110 and 02:09:34,362.

Note that due to an error in the test, the removed replica is on the follower (where the listener is). Tests which remove the leader replica end up with CONTROLLER-1693 instead.

[2] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/715/archives/log.html.gz#s1-s38-t3-k2-k17-k2-k1-k4-k6
[3] https://logs.opendaylight.org/releng/jenkins092/controller-csit-3node-clustering-only-carbon/715/archives/odl2_karaf.log.gz

Comment by Tomas Cere [ 22/May/17 ]

(In reply to Vratko Polák from comment #1)
>
> For prefix-based shards, unsubscribe-ddtl simply does not respond [2] within
> 30 seconds.
> As usual, no listener-related message is seen in karaf.log [3] between
> 02:07:15,110 and 02:09:34,362.
>

Since unsubscribe-ddtl/dtcl does a read there needs to be the same timeout here as there is for the other transactions.

The NPE in unsbuscribe-dtcl happened because the listener didnt receive any notifications.

Comment by Tomas Cere [ 22/May/17 ]

https://git.opendaylight.org/gerrit/#/c/57621/

This patch only avoids the NPE from being reported.

Comment by Tomas Cere [ 24/May/17 ]

https://git.opendaylight.org/gerrit/#/c/57763/
https://git.opendaylight.org/gerrit/#/c/57764/

Generated at Wed Feb 07 19:56:12 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.