Re: [PATCH 0/4] workqueue: Detect stalled in-flight workers

From: Breno Leitao

Date: Wed Mar 04 2026 - 10:45:32 EST


Hello Tejun,

On Wed, Feb 11, 2026 at 08:56:11AM -1000, Tejun Heo wrote:
> On Wed, Feb 11, 2026 at 04:29:14AM -0800, Breno Leitao wrote:
> > The workqueue watchdog detects pools that haven't made forward progress
> > by checking whether pending work items on the worklist have been waiting
> > too long. However, this approach has a blind spot: if a pool has only
> > one work item and that item has already been dequeued and is executing on
> > a worker, the worklist is empty and the watchdog skips the pool entirely.
> > This means a single hogged worker with no other pending work is invisible
> > to the stall detector.
> >
> > I was able to come up with the following example that shows this blind
> > spot:
> >
> > static void stall_work_fn(struct work_struct *work)
> > {
> > for (;;) {
> > mdelay(1000);
> > cond_resched();
> > }
> > }
>
> Workqueue doesn't require users to limit execution time. As long as there is
> enough supply of concurrency to avoid stalling of pending work items, work
> items can run as long as they want, including indefinitely. Workqueue stall
> is there to indicate that there is insufficient supply of concurrency.

Thank you for the clarification. Let me share more context about the
actual problem I am observing so we can think through it together.

On some production hosts, I am seeing a workqueue stall where no
backtraces are printed:

BUG: workqueue lockup - pool cpus=4 node=0 flags=0x0 nice=0 stuck for 132s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x100
pwq 18: cpus=4 node=0 flags=0x0 nice=0 active=4 refcnt=5
in-flight: 178:stall_work1_fn [wq_stall]
pending: stall_work2_fn [wq_stall], free_obj_work, psi_avgs_work
workqueue mm_percpu_wq: flags=0x108
pwq 18: cpus=4 node=0 flags=0x0 nice=0 active=1 refcnt=2
pending: vmstat_update
pool 18: cpus=4 node=0 flags=0x0 nice=0 hung=132s workers=2 idle: 45
Showing backtraces of running workers in stalled
CPU-bound worker pools:
<nothing here>

We initially suspected a TOCTOU issue, and Omar put together a patch to
address that, but it did not identify anything.

After digging deeper, I believe I have found the root cause along with
a reproducer[1]:

1) kfence executes toggle_allocation_gate() as a delayed workqueue
item (kfence_timer) on the system WQ.

2) toggle_allocation_gate() enables a static key, which IPIs every
CPU to patch code:
static_branch_enable(&kfence_allocation_key);

3) toggle_allocation_gate() then sleeps in TASK_IDLE waiting for a
kfence allocation to occur:
wait_event_idle(allocation_wait,
atomic_read(&kfence_allocation_gate) > 0 || ...);

This can last indefinitely if no allocation goes through the
kfence path. The worker remains in the pool's busy_hash
(in-flight) but is no longer task_is_running().

4) The workqueue watchdog detects the stall and calls
show_cpu_pool_hog(), which only prints backtraces for workers
that are actively running on CPU:

static void show_cpu_pool_hog(struct worker_pool *pool) {
...
if (task_is_running(worker->task))
sched_show_task(worker->task);
}

5) Nothing is printed because the offending worker is in TASK_IDLE
state. The output shows "Showing backtraces of running workers in
stalled CPU-bound worker pools:" followed by nothing, effectively
hiding the actual culprit.

The fix I am considering is to remove the task_is_running() filter in
show_cpu_pool_hog() so that all in-flight workers in stalled pools have
their backtraces printed, regardless of whether they are running or
sleeping. This would make sleeping culprits like toggle_allocation_gate()
visible in the watchdog output.

When I test without the task_runinng, then I see the culprit.

Fix I am testing:

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index aeaec79bc09c4..3f5ee08f99313 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7593,19 +7593,17 @@ static void show_cpu_pool_hog(struct worker_pool *pool)
raw_spin_lock_irqsave(&pool->lock, irq_flags);

hash_for_each(pool->busy_hash, bkt, worker, hentry) {
- if (task_is_running(worker->task)) {
- /*
- * Defer printing to avoid deadlocks in console
- * drivers that queue work while holding locks
- * also taken in their write paths.
- */
- printk_deferred_enter();
+ /*
+ * Defer printing to avoid deadlocks in console
+ * drivers that queue work while holding locks
+ * also taken in their write paths.
+ */
+ printk_deferred_enter();

- pr_info("pool %d:\n", pool->id);
- sched_show_task(worker->task);
+ pr_info("pool %d:\n", pool->id);
+ sched_show_task(worker->task);

- printk_deferred_exit();
- }
+ printk_deferred_exit();
}

raw_spin_unlock_irqrestore(&pool->lock, irq_flags);
@@ -7616,7 +7614,7 @@ static void show_cpu_pools_hogs(void)
struct worker_pool *pool;
int pi;

- pr_info("Showing backtraces of running workers in stalled CPU-bound worker pools:\n");
+ pr_info("Showing backtraces of in-flight workers in stalled CPU-bound worker pools:\n");

rcu_read_lock();

Then I see:

BUG: workqueue lockup - pool cpus=6 node=0 flags=0x0 nice=0 stuck for 34s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x100
pwq 26: cpus=6 node=0 flags=0x0 nice=0 active=3 refcnt=4
in-flight: 161:stall_work1_fn [wq_stall]
pending: stall_work2_fn [wq_stall], psi_avgs_work
workqueue mm_percpu_wq: flags=0x108
pwq 26: cpus=6 node=0 flags=0x0 nice=0 active=1 refcnt=2
pending: vmstat_update
pool 26: cpus=6 node=0 flags=0x0 nice=0 hung=34s workers=3 idle: 210 57
Showing backtraces of in-flight workers in stalled CPU-bound worker pools:
pool 26:
task:kworker/6:1 state:I stack:0 pid:161 tgid:161 ppid:2 task_flags:0x4208040 flags:0x00080000
Call Trace:
<TASK>
__schedule+0x1521/0x5360
? console_trylock+0x40/0x40
? preempt_count_add+0x92/0x1a0
? do_raw_spin_lock+0x12c/0x2f0
? is_mmconf_reserved+0x390/0x390
? schedule+0x91/0x350
? schedule+0x91/0x350
schedule+0x165/0x350
stall_work1_fn+0x17f/0x250 [wq_stall]


Link: https://github.com/leitao/debug/blob/main/workqueue_stall/wq_stall.c [1]