Re: 4.9.0 regression in pipe-backed iov_iter with systemd-nspawn

From: Al Viro
Date: Fri Jan 13 2017 - 20:44:02 EST


On Sat, Jan 14, 2017 at 01:24:28AM +0000, Al Viro wrote:
> On Fri, Jan 13, 2017 at 04:59:37PM -0800, Linus Torvalds wrote:
>
> > EXCEPT.
> >
> > I don't think "i->iov_offset" is actually correct. If you truncated
> > the whole thing, you should have cleared iov_offset too, and that
> > never happened. So truncating everything will not empty the buffers
> > for us, we'll still get to that "if (off)" case and have nrbufs=1.
> >
> > So I'd expect something like
> >
> > if (!size)
> > i->iov_offset = 0;
> >
> > in pipe_advance(), in order to really free all buffers for that case. No?
>
> Why would advance by 0 change ->iov_offset here?
>
> > Or is there some guarantee that iov_offset was already zero there? I
> > don't see it, but I'm batting zero on this code...
>
> It was zero initially (see iov_iter_pipe()). It was not affected by
> iov_iter_get_pages() and friends. If there was copy_to_iter/zero_iter/
> copy_page_to_iter for any non-zero amount of data, then it's _not_ zero and
> should bloody well stay such.

PS: note that after copy_page_to_iter()/copy_to_iter()/zero_iter() we have
->idx pointing to the last used buffer and ->iov_offset pointing to the end
of data in it, even if it's certain to be full and the next piece written
will go into the next buffer. The only situation where we have zero
->iov_offset is when no copying had been done at all (via that iov_iter,
that is). In that case ->idx points to the empty buffer.

If you start with empty pipe and copy_to_pipe() until it fills, you'll have
(assuming e.g. 4K pages and ->curbuf == 3 in the beginning)
->idx == 3, ->iov_offset == 0 initially
->idx == 3, ->iov_offset == 4096 after 4096 bytes copied
->idx == 4, ->iov_offset == 1 after 4097 bytes
...
->idx == 2, ->iov_offset == 4096 by the end of it.

That's what this "correction" is about...