Re: [PATCH] aio: prevent the final fput() in the middle of vfs_poll() (Re: KASAN: use-after-free Read in unix_dgram_poll)

From: Al Viro
Date: Sun Mar 03 2019 - 21:36:37 EST

On Sun, Mar 03, 2019 at 02:23:33PM -0800, Linus Torvalds wrote:

OK, having dug through the archives, the reasons were not strong.
So that part is OK...

> @@ -1060,6 +1071,8 @@ static inline void iocb_put(struct aio_kiocb *iocb)
> {
> if (refcount_read(&iocb->ki_refcnt) == 0 ||
> refcount_dec_and_test(&iocb->ki_refcnt)) {
> + if (iocb->ki_filp)
> + fput(iocb->ki_filp);
> percpu_ref_put(&iocb->ki_ctx->reqs);
> kmem_cache_free(kiocb_cachep, iocb);
> }

That reminds me - refcount_t here is rather ridiculous; what we
have is
* everything other than aio_poll: ki_refcnt is 0 all along
* aio_poll: originally 0, then set to 2, then two iocb_put()
are done (either both synchronous to aio_poll(), or one sync and one

That's not a refcount at all. It's a flag, set only for aio_poll ones.
And that test in iocb_put() is "if flag is set, clear it and bugger off".

What's worse, AFAICS the callers in aio_poll() are buggered (see below)

> static inline void aio_poll_complete(struct aio_kiocb *iocb, __poll_t mask)
> {
> - struct file *file = iocb->poll.file;
> -
> aio_complete(iocb, mangle_poll(mask), 0);
> - fput(file);
> }

No reasons to keep that function at all now...

> - if (unlikely(apt.error)) {
> - fput(req->file);
> + if (unlikely(apt.error))
> return apt.error;
> - }
> if (mask)
> aio_poll_complete(aiocb, mask);

Looking at that thing... How does it manage to avoid leaks
when we try to use it on e.g. /dev/tty, which has
poll_wait(file, &tty->read_wait, wait);
poll_wait(file, &tty->write_wait, wait);
in n_tty_poll()?

AFAICS, we'll succeed adding it to the first queue, then have
aio_poll_queue_proc() fail and set apt.error to -EINVAL.
Suppose we are looking for EPOLLIN and there's nothing ready
to read. We'll go
mask = vfs_poll(req->file, & & req->events;
mask is 0.
if (unlikely(!req->head)) {
nope - it's &tty->read_wait, not NULL
/* we did not manage to set up a waitqueue, done */
goto out;

if (req->woken) {
nope - no wakeups so far
/* wake_up context handles the rest */
mask = 0;
apt.error = 0;
} else if (mask || apt.error) {
apt.error is non-zero
/* if we get an error or a mask we are done */
OK, removed from queue
} else {
/* actually waiting for an event */
list_add_tail(&aiocb->ki_list, &ctx->active_reqs);
aiocb->ki_cancel = aio_poll_cancel;

if (unlikely(apt.error)) {
return apt.error;
... and we return -EINVAL to __io_submit_one(), where we hit
* If ret is 0, we'd either done aio_complete() ourselves or have
* arranged for that to be done asynchronously. Anything non-zero
* means that we need to destroy req ourselves.
if (ret)
goto out_put_req;
return 0;
if (req->ki_eventfd)
put_reqs_available(ctx, 1);
return ret;

and all knowledge of req is lost. But we'd done only *one* iocb_put() in
that case, and ->ki_refcnt had been set to 2 by aio_poll(). How could it
avoid a leak? The same goes for "->poll() returns something without
bothering to call poll_wait()" case, actually...

IOW, I would rather have aio_poll() (along with your fput-a-bit-later change)
do this -
if (mask && !apt.error)
aio_complete(aiocb, mangle_poll(mask), 0);
return apt.error;