Re: [PATCH v2] KVM: irqfd: fix deadlock by moving synchronize_srcu out of resampler_lock
From: Sonam Sanju
Date: Thu Apr 23 2026 - 05:09:52 EST
Hello Tejun,
Thank you for the detailed analysis.
On Wed, Apr 23, 2026, Tejun Heo wrote:
> The problem with this theory is that this kworker, while preempted, is still
> runnable and should be dispatched to its CPU once it becomes available
> again. Workqueue doesn't care whether the task gets preempted or when it
> gets the CPU back. It only cares about whether the task enters blocking
> state (!runnable). A task which is preempted, even on the way to blocking,
> still is runnable and should get put back on the CPU by the scheduler.
>
> If you can take a crashdump of the deadlocked state, can you see whether the
> task is still on the scheduler's runqueue?
I instrumented show_one_worker_pool() to dump scheduler state for each busy worker
when the pool has been hung for >30 seconds.
All workers show on_rq=0.
== Pool state ==
pool 2: cpus=0 node=0 flags=0x0 nice=0 hung=47s
workers=13 nr_running=1 nr_idle=7
== Per-worker scheduler state (first dump at t=62.5s) ==
PID | state | on_rq | se.on_rq | sched_delayed | sleeping | blocked_on
-----|-------|-------|----------|---------------|----------|-------------------
4819 | 0x2 | 0 | 0 | 0 | 1 | ffff953608205210 type=1
4823 | 0x2 | 0 | 0 | 0 | 1 | ffff953608205210 type=1
4818 | 0x2 | 0 | 0 | 0 | 0 | ffff953608205210 type=1
11 | 0x2 | 0 | 0 | 0 | 1 | ffff953608205210 type=1
9 | 0x2 | 0 | 0 | 0 | 1 | ffff953608205210 type=1
4814 | 0x2 | 0 | 0 | 0 | 1 | (mutex holder)
All 6 workers are in kvm-irqfd-cleanup, calling irqfd_shutdown â??
irqfd_resampler_shutdown. They contend on the same resampler->lock
mutex (ffff953608205210).
Full logs: https://gist.github.com/sonam-sanju/08042878542b7a58d2818e6076554211
Thanks,
Sonam