Re: WARNING in iov_iter_revert (2)

From: Al Viro
Date: Sat Feb 20 2021 - 14:33:50 EST


On Sat, Feb 20, 2021 at 05:38:49PM +0000, Al Viro wrote:
> On Sat, Feb 20, 2021 at 08:56:40AM -0800, Linus Torvalds wrote:
> > Al,
> > This is the "FIXME! Have Al check this!" case in do_tty_write(). You were
> > in on that whole discussion, but we never did get to that issue...
> >
> > There are some subtle rules about doing the iov_iter_revert(), but what's
> > the best way to do this properly? Instead of doing a copy_from_iter() and
> > then reverting the part that didn't fit in the buffer, doing a
> > non-advancing copy and then advancing the amount that did fit, or what?
> >
> > I still don't have power, so this is all me on mobile with html email
> > (sorry), and limited ability to really look closer.
> >
> > "Help me, Albi-wan Viro, you're my only hope"
>
> Will check... BTW, when you get around to doing pulls, could you pick
> the replacement (in followup) instead of the first pull request for
> work.namei? Jens has caught a braino in the last commit there...

It turned out to be really amusing. What happens is write(fd, NULL, 0)
on /dev/ttyprintk, with N_GSM0710 for ldisc (== "pass the data as
is to tty->op->write()". And that's the first write since opening
that sucker, so we end up with
/* write_buf/write_cnt is protected by the atomic_write_lock mutex */
if (tty->write_cnt < chunk) {
unsigned char *buf_chunk;

if (chunk < 1024)
chunk = 1024;

buf_chunk = kmalloc(chunk, GFP_KERNEL);
if (!buf_chunk) {
ret = -ENOMEM;
goto out;
}
kfree(tty->write_buf);
tty->write_cnt = chunk;
tty->write_buf = buf_chunk;
}
doing nothing - ->write_cnt is still 0 and ->write_buf - NULL. Then
we copy 0 bytes from source to ->write_buf(), which reports that 0
bytes had been copied, TYVM. Then we call
ret = write(tty, file, tty->write_buf, size);
i.e.
ret = gsm_write(tty, file, NULL, 0);
which calls
tpk_write(tty, NULL, 0)
which does
tpk_printk(NULL, 0);
and _that_ has a very special semantics:
int i = tpk_curr;

if (buf == NULL) {
tpk_flush();
return i;
}
i.e. it *can* return a positive number that gets propagated all way
back to do_tty_write(). And then you notice that it has reports
successful write of amount other than what you'd passed and tries
to pull back. By amount passed - amount written. With iov_iter_revert()
saying that some tosser has asked it to revert by something close to
~(size_t)0.

IOW, it's not iov_iter_revert() being weird or do_tty_write() misuing it -
it's tpk_write() playing silly buggers. Note that old tree would've
gone through seriously weird contortions on the same call:
// chunk and count are 0, ->write_buf is NULL
for (;;) {
size_t size = count;
if (size > chunk)
size = chunk;
ret = -EFAULT;
if (copy_from_user(tty->write_buf, buf, size))
break;
ret = write(tty, file, tty->write_buf, size);
if (ret <= 0)
break;
written += ret;
buf += ret;
count -= ret;
if (!count)
break;
ret = -ERESTARTSYS;
if (signal_pending(current))
break;
cond_resched();
}
and we get written = ret = small positive, count = - that amount,
buf = NULL + that mount. On the next iteration size = 0 (since
chunk is still 0), with same no-op copy_from_user() of 0 bytes,
then gsm_write(tty, file, NULL, 0) and since tpk_flush() zeroes
tpk_curr we finally get 0 out of tpk_printk/tpk_write/gsm_write
and bugger off on if (ret <= 0). Then we have the value in written
returned.

So yeah, this return value *was* returned to userland. Except that
if we had done any writes before that, we'd find ->write_buf
non-NULL and the magical semantics of write(fd, NULL, 0) would
*not* have triggered - we would've gotten zero.

Do we want to preserve that weirdness of /dev/ttyprintk writes?
That's orthogonal to the iov_iter uses in there.