Re: [PATCH v2 4/5] workqueue: Show all busy workers in stall diagnostics

From: Song Liu

Date: Thu Mar 05 2026 - 12:24:43 EST


On Thu, Mar 5, 2026 at 8:16 AM Breno Leitao <leitao@xxxxxxxxxx> wrote:
>
> show_cpu_pool_hog() only prints workers whose task is currently running
> on the CPU (task_is_running()). This misses workers that are busy
> processing a work item but are sleeping or blocked — for example, a
> worker that clears PF_WQ_WORKER and enters wait_event_idle(). Such a
> worker still occupies a pool slot and prevents progress, yet produces
> an empty backtrace section in the watchdog output.
>
> This is happening on real arm64 systems, where
> toggle_allocation_gate() IPIs every single CPU in the machine (which
> lacks NMI), causing workqueue stalls that show empty backtraces because
> toggle_allocation_gate() is sleeping in wait_event_idle().
>
> Remove the task_is_running() filter so every in-flight worker in the
> pool's busy_hash is dumped. The busy_hash is protected by pool->lock,
> which is already held.
>
> Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>

Acked-by: Song Liu <song@xxxxxxxxxx>