Re: [PATCH v2] KVM: irqfd: fix deadlock by moving synchronize_srcu out of resampler_lock

From: Paul E. McKenney

Date: Tue Mar 31 2026 - 16:54:19 EST


On Tue, Mar 31, 2026 at 11:17:19AM -0700, Sean Christopherson wrote:
> +srcu folks
>
> Please don't post subsequent versions In-Reply-To previous versions, it tends to
> muck up tooling.
>
> On Mon, Mar 23, 2026, Sonam Sanju wrote:
> > irqfd_resampler_shutdown() and kvm_irqfd_assign() both call
> > synchronize_srcu_expedited() while holding kvm->irqfds.resampler_lock.
> > This can deadlock when multiple irqfd workers run concurrently on the
> > kvm-irqfd-cleanup workqueue during VM teardown or when VMs are rapidly
> > created and destroyed:
> >
> > CPU A (mutex holder) CPU B/C/D (mutex waiters)
> > irqfd_shutdown() irqfd_shutdown() / kvm_irqfd_assign()
> > irqfd_resampler_shutdown() irqfd_resampler_shutdown()
> > mutex_lock(resampler_lock) <---- mutex_lock(resampler_lock) //BLOCKED
> > list_del_rcu(...) ...blocked...
> > synchronize_srcu_expedited() // Waiters block workqueue,
> > // waits for SRCU grace preventing SRCU grace
> > // period which requires period from completing
> > // workqueue progress --- DEADLOCK ---
> >
> > In irqfd_resampler_shutdown(), the synchronize_srcu_expedited() in
> > the else branch is called directly within the mutex. In the if-last
> > branch, kvm_unregister_irq_ack_notifier() also calls
> > synchronize_srcu_expedited() internally. In kvm_irqfd_assign(),
> > synchronize_srcu_expedited() is called after list_add_rcu() but
> > before mutex_unlock(). All paths can block indefinitely because:
> >
> > 1. synchronize_srcu_expedited() waits for an SRCU grace period
> > 2. SRCU grace period completion needs workqueue workers to run
> > 3. The blocked mutex waiters occupy workqueue slots preventing progress
>
> Unless I'm misunderstanding the bug, "fixing" in this in KVM is papering over an
> underlying flaw. Essentially, this would be establishing a rule that
> synchronize_srcu_expedited() can *never* be called while holding a mutex. That's
> not viable.

First, it is OK to invoke synchronize_srcu_expedited() while holding
a mutex. Second, the synchronize_srcu_expedited() function's use of
workqueues is the same as that of synchronize_srcu(), so in an alternate
universe where it was not OK to invoke synchronize_srcu_expedited() while
holding a mutex, it would also not be OK to invoke synchronize_srcu()
while holding that same mutex. Third, it is also OK to acquire that
same mutex within a workqueue handler. Fourth, SRCU and RCU use their
own workqueue, which no one else should be using (and that prohibition
most definitely includes the irqfd workers).

As a result, I do have to ask... When you say "multiple irqfd workers",
exactly how many such workers are you running?

Thanx, Paul

> > 4. The mutex holder never releases the lock -> deadlock