[CONTROLLER-1487] entity structures are kept even when the entity is removed. can be used as DOS attack Created: 20/Feb/16 Updated: 25/Jul/23 Resolved: 22/May/18 |
|
| Status: | Resolved |
| Project: | controller |
| Component/s: | clustering |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | ||
| Reporter: | Jamo Luhrsen | Assignee: | Unassigned |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Issue Links: |
|
||||||||
| External issue ID: | 5397 | ||||||||
| Description |
|
when a node connects to ovsdb southbound, there is an entity-owner node when node connected: "entity-owners": { ], ], when node disconnected: "entity-owners": { ], ], |
| Comments |
| Comment by Anil Vishnoi [ 21/Feb/16 ] |
|
Hi Jamo, As per the discussion on following thread, this is expected behavior. https://lists.opendaylight.org/pipermail/integration-dev/2016-February/005957.html |
| Comment by Jamo Luhrsen [ 23/Feb/16 ] |
|
the root issue here is that entity owner service (EOS) does not clean up the entry when it goes "candidateless". So in the case when a lot of new/unique one quick way to see this is to start an ovs bridge: sudo ovs-vsctl add-br memLeakerBridge connect it to openflowplugin: sudo ovs-vsctl set-controller memLeakerBridge tcp:${ODL_IP}:6633 (NOTE: feature installed is openflowplugin-flow-services-rest) cycle through mac addresses: sudo ovs-vsctl set bridge memLeakerBridge01 other-config:hwaddr=00:00:00:00:00:01 here's a hack of a python script to do the cycling: for i in xrange(0x00, 0xFF): |
| Comment by Robert Varga [ 24/Feb/16 ] |
|
As indicated, entities do not have a complete lifecycle (as usual, removal is missing) and this is a bug. |
| Comment by Tom Pantelis [ 24/Feb/16 ] |
|
I've prototyped it but I'm afraid that removing an entity when "candidateless" will introduce latent timing bugs. I'm seeing timing-related failures even in unit tests. Even if we delay removal via timer (say 15 sec) and recheck that there are no current candidates, there may be a candidate add transaction inflight, in which case the delete would remove it afterwards. I don't see any way to alleviate this potential issue with the way the in-memory data tree works. I'm inclined to leave it as is. The memory footprint for an empty entity node is pretty small so it would likely take millions to run OOM (depending on how much memory is allocated) and, at least with current use cases, complete removal of an entity should be infrequent. Wrt to DOS attack, any config yang list is a potential DOS attack. Eg, one could easily keep putting nodes to the inventory node list via restconf and run it OOM. |
| Comment by Jamo Luhrsen [ 24/Feb/16 ] |
|
(In reply to Tom Pantelis from comment #4) I did not specifically keep track of the count, but I think I ran it OOM > Wrt to DOS attack, any config yang list is a potential DOS attack. Eg, one fair point, but at least in this specific case we can also issue a |
| Comment by Tom Pantelis [ 24/Feb/16 ] |
|
The number will depend on how much memory you allocate to the JVM - I think the default is 2G so thousands would probably do it. But the same issue would occur if you actually had thousands of valid entities. Also doesn't OVSDB/OF leave the inventory node for a bit of time before purging? If so that's accounting for memory as well and will be affected by your blaster script. If you run OOM, the process would likely be hosed and you likely wouldn't be able to issue a delete I'm not saying we shouldn't remove the entries - I just don't see a safe/foolproof way to do it automatically w/o introducing potential race conditions that would result in difficult sporadic bugs to track down. Definitely with removing them immediately. Using a timer would be safer and reduce the chance for a race condition. How long is safe? Probably some minutes to hours we could assume the entity won't come back, although it would require more memory for the bookkeeping to track which ones are eligible to purge. But that still wouldn't stop a DOS attack like your script which would hammer it in seconds. (In reply to Jamo Luhrsen from comment #5) |
| Comment by Robert Varga [ 24/Feb/16 ] |
|
I do not pretend to understand the EOS implementation, but EntityOwnershipShard is based on Shard, hence it inherently contains and internal DataTree and can control what data is inside that. Since entities are stored in a list, the DataTree does not remove them automatically (as it does for-non-presence containers), but it is certainly in the realm of possibility for EntityOwnershipShard to make this list appear and disappear as candidates are added or deleted. This would require non-trivial amount of surgery, I suspect, probably increasing coupling between Shard and EntityOwnershipShard. Since we have splitting out EOS on our plate for Boron, I suggest we tackle this problem once the split is done, make that squeeky-clean and then consider a backport to Beryllium. |
| Comment by Tom Pantelis [ 24/Feb/16 ] |
|
The EntityOwnershipShard has full control over the data. I'll provide an example race condition:
So we end up with a strange condition where a node thinks it has a candidate but the entity is gone. It would also try to write node2 as the owner but that would fail. This is why we didn't previously deal with entity removal. Maybe there's some trickiness we can do to alleviate this scenario with more EOS/Shard coupling as Robert mentioned. It's possible the EOS could hook into the can-commit/pre-commit (via Tony's new commit cohort stuff) and inspect the modification and abort an entity delete if it has a candidate. |
| Comment by viru [ 11/Aug/16 ] |
|
(In reply to Anil Vishnoi from comment #9) hi, can you please confirm that |