Re: [RFC PATCH 1/2] mm: introduce bmap_walk()

From: Al Viro
Date: Mon Jun 19 2017 - 14:20:15 EST


On Sun, Jun 18, 2017 at 09:51:52AM +0200, Christoph Hellwig wrote:

> > That said, I think "please don't add a new bmap()
> > user, use iomap instead" is a fair comment. You know me well enough to
> > know that would be all it takes to redirect my work, I can do without
> > the bluster.
>
> But that's not the point. The point is that ->bmap() semantics simplify
> do not work in practice because they don't make sense.

Speaking of iomap, what's supposed to happen when doing a write into what
used to be a hole? Suppose we have a file with a megabyte hole in it
and there's some process mmapping that range. Another process does
write over the entire range. We call ->iomap_begin() and allocate
disk blocks. Then we start copying data into those. In the meanwhile,
the first process attempts to fetch from address in the middle of that
hole. What should happen?

Should the blocks we'd allocated in ->iomap_begin() be immediately linked
into the whatever indirect locks/btree/whatnot we are using? That would
require zeroing all of them first - otherwise that readpage will read
uninitialized block. Another variant would be to delay linking them
in until ->iomap_end(), but... Suppose we get the page evicted by
memory pressure after the writer is finished with it. If ->readpage()
comes before ->iomap_end(), we'll need to somehow figure out that it's
not a hole anymore, or we'll end up with an uptodate page full of zeroes
observed by reads after successful write().

The comment you've got in linux/iomap.h would seem to suggest the second
interpretation, but neither it nor anything in Documentation discusses the
relations with readpage/writepage...