Re: [PATCH 06/11] pipe: no need to confirm page cache buf

From: Miklos Szeredi
Date: Tue Sep 27 2016 - 03:35:02 EST

On Tue, Sep 27, 2016 at 5:40 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Wed, Sep 14, 2016 at 10:37:11AM +0200, Miklos Szeredi wrote:
>> Things could happen to that page that make it not uptodate while sitting in
>> the pipe, but it's questionable whether we should care about that.
>> Checking for being uptodate in the face of such page state change is always
>> going to be racy.
> I'm not sure it's the right thing to do here; that area looks like a victim
> of serious bitrot - once upon a time it was ->map(), which used to lock
> page, verity that it's valid, and kmap it. ->unmap() did kunmap + unlock.
> Then the validate part got split off (->pin(), later renamed to ->confirm()),
> with lock _not_ held over the kmap/kunmap. That's the point when it got racy,
> AFAICS. OTOH, I would really hate to hold a page locked over e.g. copying to
> userland - too easy to get deadlocks that way.
> Jens, could you comment? Pages definitely shouldn't be getting into pipe
> without having been uptodate; the question is what (if anything) should be
> done about having a page go non-uptodate (on truncate, hole-punching, etc.)
> while a reference to it is sitting in a pipe buffer.

Truncate/holepunch is interesting. It doesn't make the page go
non-uptodate, just clears page->mapping. Works like a charm, old data
can be read from the pipe just fine.

Partial truncate is even more interesting, because it will result in
partially cleared data (race is there with plain read() as well,
AFAICS, but very narrow).

Page invalidation by filesystems is similar to truncate. Old data can
be read from the pipe. And in fact this probably the way we want it
to work, since redoing the page lookup in ->confirm() would be really

Also just modifying the data sitting in the pipe will also result in
undefined behavior; either the old or the new data can be read out
from the other end of the pipe.

And while not explicitly documented, all the above cases are fine and
implicit in the non-copy behavior of splice. Perhaps the man page
should note that modifying data after splice but before reading from
the other end of the pipe results in undefined behavior.

I haven't looked at filesystems, but generic code calls
ClearPageUptodate() from only a few places:

end_swap_bio_read(): does it on a non-uptodate page
page_endio(): AFAICS part of the page reading chain, again doing it on
a non-uptodate page
me_swapcache_dirty(): memory error on dirty swapcache page, this is
the only one that would make sense to trigger EIO on reading the pipe
buffer. But then why only the dirty swapcache case?