Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks

From: Linus Torvalds
Date: Thu Mar 03 2016 - 13:14:23 EST


On Thu, Mar 3, 2016 at 10:01 AM, Martin K. Petersen
<martin.petersen@xxxxxxxxxx> wrote:
>>>>>> "Linus" == Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
>
> Linus> .. but the flag doesn't even set that. Even if you avoid TRIM,
> Linus> there is absolutely zero guarantees that WRITE_SAME would do
> Linus> "real storage blocks full of zeroes backing the LBAs they just
> Linus> wrote out".
>
> That's not entirely true. Writing the blocks may cause them to be
> allocated on the storage device (depending on which flags we feed it in
> WRITE SAME).

Ok, so now we're getting somewhere, with actual _reasons_ why somebody
would want to use one interface over another.

> The filesystems people were wanted the following semantics:
>
> - deallocate, don't care about contents for future reads (discard)
> - deallocate, guarantee zeroes on future reads (zeroout)
> - (re)allocate, guarantee zeroes on future reads (zeroout)
>
> Maybe we just need a better naming scheme...

Yes.

And this does make me think that Christoph is right: this would be so
much better if the block layer just supported fallocate() instead,
which already has those operations.

Right now we have

if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
return -ENODEV;

so right now the vfs_fallocate() code expliitly disallows block
devices, but that would be easy to expand.

Would people be happy with that kind of patch instead? It would
certainly make all my objections go away..

Linus