Re: 4.9.0 regression in pipe-backed iov_iter with systemd-nspawn
From: Al Viro
Date: Fri Jan 13 2017 - 20:32:19 EST
On Fri, Jan 13, 2017 at 04:59:37PM -0800, Linus Torvalds wrote:
> EXCEPT.
>
> I don't think "i->iov_offset" is actually correct. If you truncated
> the whole thing, you should have cleared iov_offset too, and that
> never happened. So truncating everything will not empty the buffers
> for us, we'll still get to that "if (off)" case and have nrbufs=1.
>
> So I'd expect something like
>
> if (!size)
> i->iov_offset = 0;
>
> in pipe_advance(), in order to really free all buffers for that case. No?
Why would advance by 0 change ->iov_offset here?
> Or is there some guarantee that iov_offset was already zero there? I
> don't see it, but I'm batting zero on this code...
It was zero initially (see iov_iter_pipe()). It was not affected by
iov_iter_get_pages() and friends. If there was copy_to_iter/zero_iter/
copy_page_to_iter for any non-zero amount of data, then it's _not_ zero and
should bloody well stay such.
Note that the normal case is something like O_DIRECT read grabbing pages,
doing a bunch of IO into such and, once it's done, iov_iter_advance()
for the actual amount read being done by
retval = mapping->a_ops->direct_IO(iocb, &data);
if (retval >= 0) {
iocb->ki_pos += retval;
iov_iter_advance(iter, retval);
}
which advances iter past the actually read data. If we proceed to
fall back to the do_generic_file_read(), it will do a bunch of
copy_page_to_iter() to the points past that.
default_file_splice_read() pretty much imitates O_DIRECT read in that
respect, with kernel_readv() being a counterpart of direct_IO.
generic_file_splice_read() is another PIPE_ITER user; that one really
calls ->read_iter(). On any non-error (including short reads) it will
not bother with iov_iter_advance() at all - ->read_iter() should've
called that already, if at all needed.
On error it does use iov_iter_advance(), pretty much as a way to
trigger pipe_truncate(). There we directly reset ->iov_offset to 0
and ->idx to its original value. Strictly speaking, we could live
without calling it there at all - normally ->read_iter() instances follow
the usual "if you'd managed to read something, report a short read,
not an error" approach. None of the exceptions are good candidates for
use of generic_file_splice_read() anyway.
However, theoretically it is possible that ->read_iter() instance does
successful copy_to_iter() and then decides to return an error. This
} else if (ret < 0) {
to.idx = idx;
to.iov_offset = 0;
iov_iter_advance(&to, 0); /* to free what was emitted */
in generic_file_splice_read() catches any such cases. Here the call
of iov_iter_advance(&to, 0) is guaranteed to be equivalent to
pipe_truncate(&to); we might make the latter non-static and call it
directly, but I'm not sure it's worth doing.
> Can you please explain when you are once more awake..