Re: [PATCH RFC 1/3] workqueue: only show running workers in stall diagnostics

From: Petr Mladek

Date: Fri Jun 19 2026 - 08:58:35 EST


On Tue 2026-06-16 09:44:39, Breno Leitao wrote:
> show_cpu_pool_busy_workers() dumps every in-flight worker in the pool's
> busy_hash, including workers that are not currently running on the CPU.
> Restore the task_is_running() filter so only running workers are dumped.
>
> When no running worker is found the pool may be stuck, unable to wake an
> idle worker to process pending work, and the watchdog would otherwise
> give no feedback. Add show_pool_no_running_worker() to report the pool
> id, CPU, idle state, and worker counts in that case.
>
> The pool info message is printed inside pool->lock using
> printk_deferred_enter/exit, the same pattern used by the existing
> busy-worker loop, to avoid deadlocks with console drivers that queue
> work while holding locks also taken in their write paths.
>
> This has been running on the Meta fleet for a while and caught some real
> issues, for instance EFI stalls stalling the workqueue [1].
>
> Link: https://lore.kernel.org/all/20260616-efi_timeout-v3-0-76dd1d26657b@xxxxxxxxxx/ [1]
> Suggested-by: Petr Mladek <pmladek@xxxxxxxx>
> Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
> Fixes: 8823eaef45da7 ("workqueue: Show all busy workers in stall diagnostics")

It looks good to me. And it is good to know that it helped in
the real life.

Reviewed-by: Petr Mladek <pmladek@xxxxxxxx>

Best Regards,
Petr