Re: [PATCH 08/31] aio: implement IOCB_CMD_POLL

From: Al Viro
Date: Tue May 22 2018 - 19:51:49 EST


On Tue, May 22, 2018 at 11:05:24PM +0100, Al Viro wrote:
> > +{
> > + struct aio_kiocb *iocb = container_of(req, struct aio_kiocb, poll);
> > +
> > + fput(req->file);
> > + aio_complete(iocb, mangle_poll(mask), 0);
> > +}
>
> Careful.
>
> > +static int aio_poll_cancel(struct kiocb *iocb)
> > +{
> > + struct aio_kiocb *aiocb = container_of(iocb, struct aio_kiocb, rw);
> > + struct poll_iocb *req = &aiocb->poll;
> > + struct wait_queue_head *head = req->head;
> > + bool found = false;
> > +
> > + spin_lock(&head->lock);
> > + found = __aio_poll_remove(req);
> > + spin_unlock(&head->lock);
>
> What's to guarantee that req->head has not been freed by that point?
> Look: wakeup finds ->ctx_lock held, so it leaves the sucker on the
> list, removes it from queue and schedules the call of __aio_poll_complete().
> Which gets executed just as we hit aio_poll_cancel(), starting with fput().
>
> You really want to do aio_complete() before fput(). That way you know that
> req->wait is alive and well at least until iocb gets removed from the list.

Oh, bugger...

wakeup
removed from queue
schedule __aio_poll_complete()

cancel
grab ctx->lock
remove from list
work
aio_complete()
check if it's in the list
it isn't, move on to free the sucker
cancel
call ->ki_cancel()
BOOM

Looks like we want to call ->ki_cancel() *BEFORE* removing from the list,
as well as doing fput() after aio_complete(). The same ordering, BTW, goes
for aio_read() et.al.

Look:
CPU1: io_cancel() grabs ->ctx_lock, finds iocb and removes it from the list.
CPU2: aio_rw_complete() on that iocb. Since the sucker is not in the list
anymore, we do NOT spin on ->ctx_lock and proceed to free iocb
CPU1: pass freed iocb to ->ki_cancel(). BOOM.

and if we have fput() done first (in aio_rw_complete()) we are vulnerable to
CPU1: io_cancel() grabs ->ctx_lock, finds iocb and removes it from the list.
CPU2: aio_rw_complete() on that iocb. fput() done, opening us to rmmod.
CPU1: call ->ki_cancel(), which points to freed memory now. BOOM.