Re: [PATCH v2] KVM: irqfd: fix deadlock by moving synchronize_srcu out of resampler_lock

From: Sonam Sanju

Date: Wed Apr 01 2026 - 06:05:06 EST

From: Sonam Sanju <sonam.sanju@xxxxxxxxx>

On Tue, Mar 31, 2026 at 01:51:00PM -0700, Paul E. McKenney wrote:
> On Tue, Mar 31, 2026 at 11:17:19AM -0700, Sean Christopherson wrote:
> > Please don't post subsequent versions In-Reply-To previous versions, it tends to
> > muck up tooling.

Noted, will send future versions as new top-level threads. Sorry about
that.

> > Unless I'm misunderstanding the bug, "fixing" in this in KVM is papering over an
> > underlying flaw. Essentially, this would be establishing a rule that
> > synchronize_srcu_expedited() can *never* be called while holding a mutex. That's
> > not viable.
>
> First, it is OK to invoke synchronize_srcu_expedited() while holding
> a mutex. Second, the synchronize_srcu_expedited() function's use of
> workqueues is the same as that of synchronize_srcu(), so in an alternate
> universe where it was not OK to invoke synchronize_srcu_expedited() while
> holding a mutex, it would also not be OK to invoke synchronize_srcu()
> while holding that same mutex. Third, it is also OK to acquire that
> same mutex within a workqueue handler. Fourth, SRCU and RCU use their
> own workqueue, which no one else should be using (and that prohibition
> most definitely includes the irqfd workers).

Thank you for clarifying this.

> As a result, I do have to ask... When you say "multiple irqfd workers",
> exactly how many such workers are you running?

While running cold reboot/ warm reboot cycling in our Android platforms
with 6.18 kernel, the hung_task traces consistently show 8-15
kvm-irqfd-cleanup workers in D state. These are crosvm instances with
roughly 10-16 irqfd lines per VM (virtio-blk, virtio-net, virtio-input,
virtio-snd, etc., each with a resampler).

Vineeth Pillai (Google) reproduced a related scenario under a VM
create/destroy stress test where the workqueue reached active=1024
refcnt=2062, though that is a much more extreme case than what we see
during normal shutdown.

The first part of the deadlock is genuinely there. One worker holds
resampler_lock and blocks in synchronize_srcu_expedited() while the
remaining 8-15 workers block on __mutex_lock at
irqfd_resampler_shutdown.

Thanks,
Sonam