[PATCH v2 0/5] workqueue: Detect stalled in-flight workers

From: Breno Leitao

Date: Thu Mar 05 2026 - 11:16:05 EST

There is a blind spot exists in the work queue stall detecetor (aka
show_cpu_pool_hog()). It only prints workers whose task_is_running() is
true, so a busy worker that is sleeping (e.g. wait_event_idle())
produces an empty backtrace section even though it is the cause of the
stall.

Additionally, when the watchdog does report stalled pools, the output
doesn't show how long each in-flight work item has been running, making
it harder to identify which specific worker is stuck.

Example of the sample code:

BUG: workqueue lockup - pool cpus=4 node=0 flags=0x0 nice=0 stuck for 132s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x100
pwq 18: cpus=4 node=0 flags=0x0 nice=0 active=4 refcnt=5
in-flight: 178:stall_work1_fn [wq_stall]
pending: stall_work2_fn [wq_stall], free_obj_work, psi_avgs_work
...
Showing backtraces of running workers in stalled
CPU-bound worker pools:
<nothing here>

I see it happening on real machines, causing some stalls that doesn't
have any backtrace. This is one of the code path:

1) kfence executes toggle_allocation_gate() as a delayed workqueue
item (kfence_timer) on the system WQ.

2) toggle_allocation_gate() enables a static key, which IPIs every
CPU to patch code:
static_branch_enable(&kfence_allocation_key);

3) toggle_allocation_gate() then sleeps in TASK_IDLE waiting for a
kfence allocation to occur:
wait_event_idle(allocation_wait,
atomic_read(&kfence_allocation_gate) > 0 || ...);

This can last indefinitely if no allocation goes through the
kfence path (or IPIing all the CPUs take longer, which is common on
platforms that do not have NMI).

The worker remains in the pool's busy_hash
(in-flight) but is no longer task_is_running().

4) The workqueue watchdog detects the stall and calls
show_cpu_pool_hog(), which only prints backtraces for workers
that are actively running on CPU:

static void show_cpu_pool_hog(struct worker_pool *pool) {
...
if (task_is_running(worker->task))
sched_show_task(worker->task);
}

5) Nothing is printed because the offending worker is in TASK_IDLE
state. The output shows "Showing backtraces of running workers in
stalled CPU-bound worker pools:" followed by nothing, effectively
hiding the actual culprit.

Given I am using this detector a lot, I am also proposing additional
improvements here as well.

This series addresses these issues:

Patch 1 fixes a minor semantic inconsistency where pool flags were
checked against a workqueue-level constant (WQ_BH instead of POOL_BH).
No behavioral change since both constants have the same value.

Patch 2 renames pool->watchdog_ts to pool->last_progress_ts to better
describe what the timestamp actually tracks.

Patch 3 adds a current_start timestamp to struct worker, recording when
a work item began executing. This is printed in show_pwq() as elapsed
wall-clock time (e.g., "in-flight: 165:stall_work_fn [wq_stall] for
100s"), giving immediate visibility into how long each worker has been
busy.

Patch 4 removes the task_is_running() filter from show_cpu_pool_hog()
so that every in-flight worker in the pool's busy_hash is dumped. This
catches workers that are busy but sleeping or blocked, which were
previously invisible in the watchdog output.

With this series applied, stall output shows the backtrace for all
tasks, and for how long the work is stall. Example:

BUG: workqueue lockup - pool cpus=14 node=0 flags=0x0 nice=0 stuck for 42!
Showing busy workqueues and worker pools:
workqueue events: flags=0x100
pwq 2: cpus=0 node=0 flags=0x0 nice=0 active=1 refcnt=2
pending: vmstat_shepherd
pwq 58: cpus=14 node=0 flags=0x0 nice=0 active=4 refcnt=5
in-flight: 184:stall_work1_fn [wq_stall] for 39s
...
Showing backtraces of busy workers in stalled CPU-bound worker pools:
pool 58:
task:kworker/14:1 state:I stack:0 pid:184 tgid:184 ppid:2 task_flags:0x4208040 flags:0x00080000
Call Trace:
<TASK>
__schedule+0x1521/0x5360
schedule+0x165/0x350
stall_work1_fn+0x17f/0x250 [wq_stall]
...

---
Changes in v2:
- Drop the task_running() filter in show_cpu_pool_hog() instead of assuming a
work item cannot stay running forever.
- Add a sample code to exercise the stall detector
- Link to v1: https://patch.msgid.link/20260211-wqstall_start-at-v1-0-bd9499a18c19@xxxxxxxxxx

---
Breno Leitao (5):
workqueue: Use POOL_BH instead of WQ_BH when checking pool flags
workqueue: Rename pool->watchdog_ts to pool->last_progress_ts
workqueue: Show in-flight work item duration in stall diagnostics
workqueue: Show all busy workers in stall diagnostics
workqueue: Add stall detector sample module

kernel/workqueue.c | 47 +++++++-------
kernel/workqueue_internal.h | 1 +
samples/workqueue/stall_detector/Makefile | 1 +
samples/workqueue/stall_detector/wq_stall.c | 98 +++++++++++++++++++++++++++++
4 files changed, 124 insertions(+), 23 deletions(-)
---
base-commit: c107785c7e8dbabd1c18301a1c362544b5786282
change-id: 20260210-wqstall_start-at-e7319a005ab4

Best regards,
--
Breno Leitao <leitao@xxxxxxxxxx>