Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks

From: Ric Wheeler
Date: Mon Mar 14 2016 - 06:34:17 EST


On 03/13/2016 07:30 PM, Dave Chinner wrote:
On Fri, Mar 11, 2016 at 04:44:16PM -0800, Linus Torvalds wrote:
On Fri, Mar 11, 2016 at 4:35 PM, Theodore Ts'o <tytso@xxxxxxx> wrote:
At the end of the day it's about whether you trust the userspace
program or not.
There's a big difference between "give the user rope", and "tie the
rope in a noose and put a banana peel so that the user might stumble
into the rope and hang himself", though.

So I do think that Dave is right that we should also strive to make
sure that our interfaces are not just secure in theory, but that they
are also good interfaces to make mistakes less likely.
At which point I have to ask: how do we safely allow filesystems to
expose stale data in files? There's a big "we need to trust
userspace" component in ever proposal that has been made so far -
that's the part I have extreme trouble with.

For example, what happens when a backup process running as root a
file that has exposed stale data? Yes, we could set the "NODUMP"
flag on the inode to tell backup programs to skip backing up such
files, but we're now trusting some random userspace application
(e.g. tar, rsync, etc) not to do something we don't want it to do
with the data in that file.

AFAICT, we can't stop root from copying files that have exposed
stale data or changing their ownership without some kind of special
handling of "contains stale data" files within the kernel. At this
point we are back to needing persistent tracking of the "exposed
stale data" state in the inode as the only safe way to allow us to
expose stale data. That's fairly ironic given that the stated
purpose of exposing stale data through fallocate is to avoid the
overhead of the existing mechanisms we use to track extents
containing stale data....

I think that once we enter this mode, the local file system has effectively ceded its role to prevent stale data exposure to the upper layer. In effect, this ceases to become a normal file system for any enabled process if we control this through fallocate() or for all processes if we do the brute force mount option that would be file system wide.

That means we would not need to track this. Extents would be marked as if they always have had valid data (no more allocated but unwritten state).

In the end, that is the actual goal - move this enforcement up a layer for overlay/user space file systems that are then responsible for policing this ind of thing.

Regards,

Ric


I think we _should_ give users rope, but maybe we should also make
sure that there isn't some hidden rapidly spinning saw-blade right
next to the rope that the user doesn't even think about.
IMO we already have a good, safe interface that provides the rope
without the saw blades. I'm happy to be proven wrong, but IMO I
don't see that we can provide stale data exposure in a safe,
non-saw-bladey way without any kernel/filesystem side overhead.....

Cheers,

Dave.