Re: [PATCH] KVM: irqfd: fix shutdown deadlock by moving SRCU sync outside resampler_lock

From: Vineeth Pillai (Google)

Date: Fri Mar 20 2026 - 08:56:53 EST


Hi Sonam,

> irqfd_resampler_shutdown() calls synchronize_srcu_expedited() while
> holding kvm->irqfds.resampler_lock. This can deadlock when multiple
> irqfd_shutdown workers run concurrently on the kvm-irqfd-cleanup
> workqueue during VM teardown (e.g. crosvm shutdown on Android):
>
> CPU A (mutex holder) CPU B/C/D (mutex waiters)
> irqfd_shutdown() irqfd_shutdown()
> irqfd_resampler_shutdown() irqfd_resampler_shutdown()
> mutex_lock(resampler_lock) <---- mutex_lock(resampler_lock) // BLOCKED
> list_del_rcu(...) ...blocked...
> synchronize_srcu_expedited() // Waiters block workqueue,
> // waits for SRCU grace preventing SRCU grace
> // period which requires period from completing
> // workqueue progress --- DEADLOCK ---

I think we might have this issue in the kvm_irqfd_assign path as well
where synchronize_srcu_expedited is called with the resampler_lock
held. I saw similar lockup during a stress test where VMs were created
and destroyed continously. I could see one task waiting on SRCU GP:

[ T93] task:crosvm_security state:D stack:0 pid:8215 tgid:8215 ppid:1 task_flags:0x400000 flags:0x00080002.
[ T93] Call Trace:
[ T93] <TASK>
[ T93] __schedule+0x87a/0xd60
[ T93] schedule+0x5e/0xe0
[ T93] schedule_timeout+0x2e/0x130
[ T93] ? queue_delayed_work_on+0x7f/0xd0
[ T93] wait_for_common+0xf7/0x1f0
[ T93] synchronize_srcu_expedited+0x109/0x140
[ T93] ? __cfi_wakeme_after_rcu+0x10/0x10
[ T93] kvm_irqfd+0x362/0x5e0
[ T93] kvm_vm_ioctl+0x706/0x780
[ T93] ? fd_install+0x2c/0xf0
[ T93] __se_sys_ioctl+0x7a/0xd0
[ T93] do_syscall_64+0x61/0xf10
[ T93] ? arch_exit_to_user_mode_prepare+0x9/0xb0
[ T93] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ T93] RIP: 0033:0x79048f9bdd67
[ T93] RSP: 002b:00007ffc3aa82028 EFLAGS: 00000206.

And another task waiting on the mutex:

[ C0] task:kworker/11:2 state:R running task stack:0 pid:25180 tgid:25180 ppid:2 task_flags:0x4208060 flags:0x00080000
[ C0] Workqueue: kvm-irqfd-cleanup irqfd_shutdown
[ C0] Call Trace:
[ C0] <TASK>
[ C0] __schedule+0x87a/0xd60
[ C0] schedule+0x5e/0xe0
[ C0] schedule_preempt_disabled+0x10/0x20
[ C0] __mutex_lock+0x413/0xe40
[ C0] irqfd_resampler_shutdown+0x23/0x150
[ C0] irqfd_shutdown+0x66/0xc0
[ C0] process_scheduled_works+0x219/0x450
[ C0] worker_thread+0x30b/0x450
[ C0] ? __cfi_worker_thread+0x10/0x10
[ C0] kthread+0x230/0x270
[ C0] ? __cfi_kthread+0x10/0x10
[ C0] ret_from_fork+0xf2/0x150
[ C0] ? __cfi_kthread+0x10/0x10
[ C0] ret_from_fork_asm+0x1a/0x30
[ C0] </TASK>

The work queue was full as well I think:

[ C0] pwq 46: cpus=11 node=0 flags=0x0 nice=0 active=1024 refcnt=2062

There were other tasks waiting for SRCU GP completion in the resampler
shutdown path. Also, there were other traces showing lockups (mostly in
mm), but I think thats a secondary effect of this lockup and might not
be relevant. I can provide the full logs if needed.

Please have a look and see if this path needs to be handled to fully fix
this issue.

Thanks,
Vineeth