Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c

From: Larry Finger
Date: Tue May 24 2016 - 12:10:40 EST


On 05/23/2016 07:18 PM, Al Viro wrote:
On Mon, May 23, 2016 at 04:30:43PM -0500, Larry Finger wrote:
The mainline kernels past 4.6.0 fail hang when logging in. There are no
error messages, and the machine seems to be waiting for some event that
never happens.

The problem has been bisected to commit dd254f5a382c ("fold checks into
iterate_and_advance()"). The bisection has been verified.

The problem is the call from iov_iter_advance(). When I reinstated the old
macro with a new name and used it in that routine, the system works.
Obviously, the call that seems to be incorrect has some benefits. My
quich-and-dirty patch is attached.

I will be willing to test any patch you prepare.

Hangs where and how? A reproducer, please... This is really weird - the
only change there is in the cases when
* iov_iter_advance(i, n) is called with n greater than the remaining
amount. It's a bug, plain and simple - old variant would've been left in
seriously buggered state and at the very least we want to catch any such
places for the sake of backports
* iov_iter_advance(i, 0) - both old and new code leave *i unchanged,
but the old one dereferences i->iov[0], which be pointing beyond the end of
array by that point. The value read from there was not used by the old code,
at that.

Could you slap WARN_ON(size > i->count) in the very beginning of
iov_iter_advance() (the mainline variant) and see what triggers on your
reproducer?

As I wrote earlier, i->count was greater than zero, but size was zero, which caused the bulk of iterate_and_advance() to be skipped.

For now, the following one-line hack allows my system to boot:

diff --git a/fs/read_write.c b/fs/read_write.c
index 933b53a..d5d64d9 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -721,6 +721,7 @@ static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
ret += nr;
if (nr != iovec.iov_len)
break;
+ nr = max_t(ssize_t, nr, 1);
iov_iter_advance(iter, nr);
}

I have no idea what subtle bug in do_loop_readv_writev() is causing nr to be zero, but it seems to have been exposed by commit dd254f5a382c.

Larry