[MDSAL-212] Isolated node is not notified by EOS (when EOS leader is on different node) Created: 07/Dec/16  Updated: 09/Mar/18  Resolved: 14/Feb/17

Status: Resolved
Project: mdsal
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug
Reporter: Michal Rehak Assignee: Tomas Cere
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: File local-eos-20161213.tgz    
Issue Links:
Blocks
blocks BGPCEP-601 B and C: rejoining isolated node caus... Resolved
External issue ID: 7322

 Description   

Given EOS and cluster-singleton-service are used and shard-leader for EOS is located on one cluster node and service-owner (elected by cluster-singleton-service) is on another cluster node.

When node of service-owner is isolated then there is no notification sent from cluster-singleton-service to service-owner. Result is: service-owner remains active until cluster heal. When isolated node rejoins the cluster then the notification arrives and service is stopped. But this should have happened right after isolation step.



 Comments   
Comment by Jakub Morvay [ 13/Dec/16 ]

Hi Michal,

Can you also provide logs for this bug?

Comment by Michal Rehak [ 13/Dec/16 ]

Attachment local-eos-20161213.tgz has been added with description: eos-css broken scenario log

Comment by Michal Rehak [ 13/Dec/16 ]

Karaf logs:
1-10.25.2.11.data
2-10.25.2.12.data
3-10.25.2.13.data

Isolation log:
isolation.log

Jolokia log:
jolokia.log

Merged karaf logs:
local-eos-20161213.log

Merged jolokia+isolation+service events:
local-eos-20161213.overview

(in order to see raw color sequences in human friendly form use less -R)

Comment by Viera Zelcamova [ 15/Dec/16 ]

will have to wait for January, no resource available

Comment by Tomas Cere [ 09/Feb/17 ]

https://git.opendaylight.org/gerrit/#/c/51621/

This seems to only happen when some other service managed by singleton service
throws an unexpected/uncaught exception which seems to stop the shutdown of all other services. Once the cluster heals the singleton service may receive another ownershipChanged notification which makes it seem like the attempt to close only happens after cluster heal.

Generated at Wed Feb 07 20:09:03 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.