Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks
From: Theodore Ts'o
Date: Thu Mar 10 2016 - 16:47:29 EST
On Thu, Mar 10, 2016 at 10:33:49AM -0800, Linus Torvalds wrote:
> On Thu, Mar 10, 2016 at 6:58 AM, Ric Wheeler <ricwheeler@xxxxxxxxx> wrote:
> > What was objectionable at the time this patch was raised years back (not
> > just to me, but to pretty much every fs developer at LSF/MM that year)
> > centered on the concern that this would be viewed as a "performance" mode
> > and we get pressure to support this for non-priveleged users. It gives any
> > user effectively the ability to read the block device content for previously
> > allocated data without restriction.
Sure, but it was never "any user". We always had group-based
permissions from the beginning. Sure, we passed it in via a mount
option which was a bit hacky, but we never got to the point of
discussing the way that we would modulate the access --- the complaint
seemed to be that if it was a non-root user, it was an unacceptable
security hole. And the pushback I got was more in the way of a
religious objection more than anything else. Heck, even reserving a
code point for the out-of-tree patch received a huge amount of
> The sane way to do it would be to just check permissions of the
> underlying block device.
> That way, people can just set the permissions for that to whatever
> they want. If google right now uses some magical group for this, they
> could make the underlying block device be writable for that group.
I'd suggest making it be if you had *read* access to the block device.
After all, the risk that everyone was all excited about was the risk
of being able to read stale (deleted) data from old files. And
there's no point giving the userspace cluster file system daemon the
ability to corrupt the file system or set the setuid bit on some
And if we are going to go this far, then I'd also suggest using this
permission check to the user the ability to issue BLKDISCARD on a
file. Allowing BLKDISCARD on files is one that should have been even
more of a no-brainer, since it could never reveal stale data, but
simply wasn't guaranteed to have reliable results because it was a
hint to the underlying storage device. But this has also received a
huge amount of religious pushback, which is why this is also an
out-of-tree patch in the Google kernel. (If that means that our
competitors have a higher flash TCO than us, again, no skin off my
nose. I tried to get it upstream, and cost of forward porting the
patch each time we rebase the kernel isn't _that_ annoying.)
> We can do the security check at the filesystem level, because we have
> sb->s_bdev->bd_inode, and if you have read and write permissions to
> that inode, you might as well have permission to create a unsafe hole.
> That doesn't sound very hacky to me.