Re: [PATCH] aio: Add support for the POLLFREE

From: Jeff Moyer
Date: Wed Sep 29 2021 - 14:18:27 EST


Adding Oleg and Christoph.

Ramji Jiyani <ramjiyani@xxxxxxxxxx> writes:

> Commit f5cb779ba163 ("ANDROID: binder: remove waitqueue when thread
> exits.") fixed the use-after-free in eventpoll but aio still has the
> same issue because it doesn't honor the POLLFREE flag.
>
> Add support for the POLLFREE flag to force complete iocb inline in
> aio_poll_wake(). A thread may use it to signal it's exit and/or request
> to cleanup while pending poll request. In this case, aio_poll_wake()
> needs to make sure it doesn't keep any reference to the queue entry
> before returning from wake to avoid possible use after free via
> poll_cancel() path.

Is this an in-kernel user? Can you explain more about how or when this
happens? Do you have a stack trace that shows the problem? I'm not
sure this use of POLLFREE exactly follows with the initial intention of
the flag, but hopefully Oleg can comment on that.

Thanks,
Jeff

> The POLLFREE flag is no more exclusive to the epoll and is being
> shared with the aio. Remove comment from poll.h to avoid confusion.
> Also enclosed the POLLFREE macro definition in parentheses to fix
> checkpatch error.
>
> Signed-off-by: Ramji Jiyani <ramjiyani@xxxxxxxxxx>
> ---
> fs/aio.c | 45 ++++++++++++++++++---------------
> include/uapi/asm-generic/poll.h | 2 +-
> 2 files changed, 26 insertions(+), 21 deletions(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 51b08ab01dff..5d539c05df42 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1674,6 +1674,7 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
> {
> struct poll_iocb *req = container_of(wait, struct poll_iocb, wait);
> struct aio_kiocb *iocb = container_of(req, struct aio_kiocb, poll);
> + struct kioctx *ctx = iocb->ki_ctx;
> __poll_t mask = key_to_poll(key);
> unsigned long flags;
>
> @@ -1683,29 +1684,33 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
>
> list_del_init(&req->wait.entry);
>
> - if (mask && spin_trylock_irqsave(&iocb->ki_ctx->ctx_lock, flags)) {
> - struct kioctx *ctx = iocb->ki_ctx;
> + /*
> + * Use irqsave/irqrestore because not all filesystems (e.g. fuse)
> + * call this function with IRQs disabled and because IRQs have to
> + * be disabled before ctx_lock is obtained.
> + */
> + if (mask & POLLFREE) {
> + /* Force complete iocb inline to remove refs to deleted entry */
> + spin_lock_irqsave(&ctx->ctx_lock, flags);
> + } else if (!(mask && spin_trylock_irqsave(&ctx->ctx_lock, flags))) {
> + /* Can't complete iocb inline; schedule for later */
> + schedule_work(&req->work);
> + return 1;
> + }
>
> - /*
> - * Try to complete the iocb inline if we can. Use
> - * irqsave/irqrestore because not all filesystems (e.g. fuse)
> - * call this function with IRQs disabled and because IRQs
> - * have to be disabled before ctx_lock is obtained.
> - */
> - list_del(&iocb->ki_list);
> - iocb->ki_res.res = mangle_poll(mask);
> - req->done = true;
> - if (iocb->ki_eventfd && eventfd_signal_allowed()) {
> - iocb = NULL;
> - INIT_WORK(&req->work, aio_poll_put_work);
> - schedule_work(&req->work);
> - }
> - spin_unlock_irqrestore(&ctx->ctx_lock, flags);
> - if (iocb)
> - iocb_put(iocb);
> - } else {
> + /* complete iocb inline */
> + list_del(&iocb->ki_list);
> + iocb->ki_res.res = mangle_poll(mask);
> + req->done = true;
> + if (iocb->ki_eventfd && eventfd_signal_allowed()) {
> + iocb = NULL;
> + INIT_WORK(&req->work, aio_poll_put_work);
> schedule_work(&req->work);
> }
> + spin_unlock_irqrestore(&ctx->ctx_lock, flags);
> + if (iocb)
> + iocb_put(iocb);
> +
> return 1;
> }
>
> diff --git a/include/uapi/asm-generic/poll.h b/include/uapi/asm-generic/poll.h
> index 41b509f410bf..35b1b69af729 100644
> --- a/include/uapi/asm-generic/poll.h
> +++ b/include/uapi/asm-generic/poll.h
> @@ -29,7 +29,7 @@
> #define POLLRDHUP 0x2000
> #endif
>
> -#define POLLFREE (__force __poll_t)0x4000 /* currently only for epoll */
> +#define POLLFREE ((__force __poll_t)0x4000)
>
> #define POLL_BUSY_LOOP (__force __poll_t)0x8000