Re: [PATCH v1] virt: acrn: Fix irqfd use-after-free during async shutdown

From: Fei Li

Date: Fri May 15 2026 - 03:13:04 EST


On 2026-05-11 at 21:57:37 +0800, Sicong Huang wrote:
> ACRN irqfd registers a custom waitqueue callback on the eventfd. When the
> eventfd is released, eventfd_release() wakes the waitqueue with EPOLLHUP,
> and hsm_irqfd_wakeup() queues irqfd->shutdown on vm->irqfd_wq.
>
> The irqfd object can also be removed by ACRN_IOCTL_IRQFD with
> ACRN_IRQFD_FLAG_DEASSIGN. In that path, acrn_irqfd_deassign() removes the
> waitqueue entry and frees the hsm_irqfd object.
>
> These two paths can race. If EPOLLHUP queues the shutdown work before
> deassign frees the object, the work item may run after kfree() and recover
> the freed hsm_irqfd via container_of(). It then dereferences irqfd->vm while
> taking irqfds_lock.
>
> A possible race is:
> CPU0 CPU1
> eventfd_release()
> wake_up_poll(EPOLLHUP)
> hsm_irqfd_wakeup()
> queue_work(&irqfd->shutdown)
> acrn_irqfd_deassign()
> hsm_irqfd_shutdown()
> list_del_init()
> eventfd_ctx_remove_wait_queue()
> kfree(irqfd) //free here!
> hsm_irqfd_shutdown_work()
> irqfd = container_of(work, ...)
> vm = irqfd->vm //UAF!
>
> Fix this by separating logical shutdown from object release. First remove
> the irqfd from the VM list and eventfd waitqueue, then synchronously
> cancel any pending/running shutdown work before freeing the object.
> Also tear down irqfds before destroying the irqfd workqueue, so
> eventfd wakeups cannot queue work after the workqueue has been destroyed.
>
> Signed-off-by: Sicong Huang <congei42@xxxxxxx>

Hi Sicong,

Thanks for fixing this. The race looks real to me: `EPOLLHUP` can queue
`irqfd->shutdown`, while the deassign path can remove the waitqueue entry and
free the same irqfd before the work item runs.

The direction is good, but I think the lifetime rules can be made simpler. In
this version the final `kfree()` can still come from several places, with
`list_empty()` and `cancel_work_sync()` deciding who wins. That may work, but
it is harder to audit than necessary.

Could this follow the KVM irqfd pattern more closely? Deassign/deinit would
only deactivate the irqfd, i.e. remove it from the active list and queue cleanup
once. The cleanup work would then be the only place that removes the eventfd
waitqueue entry, drops the eventfd ref, and frees `struct hsm_irqfd`. The
synchronous paths can still flush the workqueue before returning or before
destroying `vm->irqfd_wq`.

Two smaller comments:

- `next` is left unused in `acrn_irqfd_deinit()` after this change.
- It may be worth tightening the assign path too, so an irqfd is not visible on
`vm->irqfds` before its eventfd waitqueue entry has been installed.

Thanks,
Fei

> ---
> drivers/virt/acrn/irqfd.c | 47 ++++++++++++++++++++++++++++++---------
> 1 file changed, 37 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/virt/acrn/irqfd.c b/drivers/virt/acrn/irqfd.c
> index acf8cd5f8f8c..659fd40d9aa5 100644
> --- a/drivers/virt/acrn/irqfd.c
> +++ b/drivers/virt/acrn/irqfd.c
> @@ -44,30 +44,37 @@ static void acrn_irqfd_inject(struct hsm_irqfd *irqfd)
> irqfd->msi.msi_data);
> }
>
> -static void hsm_irqfd_shutdown(struct hsm_irqfd *irqfd)
> +static bool hsm_irqfd_shutdown(struct hsm_irqfd *irqfd)
> {
> u64 cnt;
>
> lockdep_assert_held(&irqfd->vm->irqfds_lock);
>
> + if (list_empty(&irqfd->list))
> + return false;
> +
> /* remove from wait queue */
> list_del_init(&irqfd->list);
> eventfd_ctx_remove_wait_queue(irqfd->eventfd, &irqfd->wait, &cnt);
> eventfd_ctx_put(irqfd->eventfd);
> - kfree(irqfd);
> +
> + return true;
> }
>
> static void hsm_irqfd_shutdown_work(struct work_struct *work)
> {
> struct hsm_irqfd *irqfd;
> struct acrn_vm *vm;
> + bool free;
>
> irqfd = container_of(work, struct hsm_irqfd, shutdown);
> vm = irqfd->vm;
> mutex_lock(&vm->irqfds_lock);
> - if (!list_empty(&irqfd->list))
> - hsm_irqfd_shutdown(irqfd);
> + free = hsm_irqfd_shutdown(irqfd);
> mutex_unlock(&vm->irqfds_lock);
> +
> + if (free)
> + kfree(irqfd);
> }
>
> /* Called with wqh->lock held and interrupts disabled */
> @@ -170,7 +177,7 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args)
> static int acrn_irqfd_deassign(struct acrn_vm *vm,
> struct acrn_irqfd *args)
> {
> - struct hsm_irqfd *irqfd, *tmp;
> + struct hsm_irqfd *irqfd, *tmp, *to_free = NULL;
> struct eventfd_ctx *eventfd;
>
> eventfd = eventfd_ctx_fdget(args->fd);
> @@ -180,13 +187,19 @@ static int acrn_irqfd_deassign(struct acrn_vm *vm,
> mutex_lock(&vm->irqfds_lock);
> list_for_each_entry_safe(irqfd, tmp, &vm->irqfds, list) {
> if (irqfd->eventfd == eventfd) {
> - hsm_irqfd_shutdown(irqfd);
> + if (hsm_irqfd_shutdown(irqfd))
> + to_free = irqfd;
> break;
> }
> }
> mutex_unlock(&vm->irqfds_lock);
> eventfd_ctx_put(eventfd);
>
> + if (to_free) {
> + cancel_work_sync(&to_free->shutdown);
> + kfree(to_free);
> + }
> +
> return 0;
> }
>
> @@ -219,9 +232,23 @@ void acrn_irqfd_deinit(struct acrn_vm *vm)
> struct hsm_irqfd *irqfd, *next;
>
> dev_dbg(acrn_dev.this_device, "VM %u irqfd deinit.\n", vm->vmid);
> +
> + for (;;) {
> + irqfd = NULL;
> +
> + mutex_lock(&vm->irqfds_lock);
> + if (!list_empty(&vm->irqfds)) {
> + irqfd = list_first_entry(&vm->irqfds, struct hsm_irqfd, list);
> + hsm_irqfd_shutdown(irqfd);
> + }
> + mutex_unlock(&vm->irqfds_lock);
> +
> + if (!irqfd)
> + break;
> +
> + cancel_work_sync(&irqfd->shutdown);
> + kfree(irqfd);
> + }
> +
> destroy_workqueue(vm->irqfd_wq);
> - mutex_lock(&vm->irqfds_lock);
> - list_for_each_entry_safe(irqfd, next, &vm->irqfds, list)
> - hsm_irqfd_shutdown(irqfd);
> - mutex_unlock(&vm->irqfds_lock);
> }
> --
> 2.34.1
>