Re: [PATCH v2 4/5] workqueue: Show all busy workers in stall diagnostics

From: Petr Mladek

Date: Fri Mar 13 2026 - 12:28:10 EST

On Fri 2026-03-13 05:57:59, Breno Leitao wrote:
> On Thu, Mar 12, 2026 at 06:03:03PM +0100, Petr Mladek wrote:
> > On Thu 2026-03-05 08:15:40, Breno Leitao wrote:
> > > show_cpu_pool_hog() only prints workers whose task is currently running
> > > on the CPU (task_is_running()). This misses workers that are busy
> > > processing a work item but are sleeping or blocked — for example, a
> > > worker that clears PF_WQ_WORKER and enters wait_event_idle().
> >
> > IMHO, it is misleading. AFAIK, workers clear PF_WQ_WORKER flag only
> > when they are going to die. They never do so when going to sleep.
> >
> > > Such a
> > > worker still occupies a pool slot and prevents progress, yet produces
> > > an empty backtrace section in the watchdog output.
> > >
> > > This is happening on real arm64 systems, where
> > > toggle_allocation_gate() IPIs every single CPU in the machine (which
> > > lacks NMI), causing workqueue stalls that show empty backtraces because
> > > toggle_allocation_gate() is sleeping in wait_event_idle().
> >
> > The wait_event_idle() called in toggle_allocation_gate() should not
> > cause a stall. The scheduler should call wq_worker_sleeping(tsk)
> > and wake up another idle worker. It should guarantee the progress.
> >
> > > Remove the task_is_running() filter so every in-flight worker in the
> > > pool's busy_hash is dumped. The busy_hash is protected by pool->lock,
> > > which is already held.
> >
> > As I explained in reply to the cover letter, sleeping workers should
> > not block forward progress. It seems that in this case, the system was
> > not able to wake up the other idle worker or it was the last idle
> > worker and was not able to fork a new one.
> >
> > IMHO, we should warn about this when there is no running worker.
> > It might be more useful than printing backtraces of the sleeping
> > workers because they likely did not cause the problem.
> >
> > I believe that the problem, in this particular situation, is that
> > the system can't schedule or fork new processes. It might help
> > to warn about it and maybe show backtrace of the currently
> > running process on the stalled CPU.
>
> Do you mean checking if pool->busy_hash is empty, and then warning?
>
> Commit fc36ad49ce7160907bcbe4f05c226595611ac293
> Author: Breno Leitao <leitao@xxxxxxxxxx>
> Date: Fri Mar 13 05:35:02 2026 -0700
>
> workqueue: warn when stalled pool has no running workers
>
> When the workqueue watchdog detects a pool stall and the pool's
> busy_hash is empty (no workers executing any work item), print a
> diagnostic warning with the pool state and trigger a backtrace of
> the currently running task on the stalled CPU.
>
> Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
> Suggested-by: Petr Mladek <pmladek@xxxxxxxx>
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 6ee52ba9b14f7..d538067754123 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -7655,6 +7655,17 @@ static void show_cpu_pool_busy_workers(struct worker_pool *pool)
>
> raw_spin_lock_irqsave(&pool->lock, irq_flags);
>
> + if (hash_empty(pool->busy_hash)) {

This would print it only when there is no in-flight work.

But I think that the problem is when there in no worker in
the running state. There should always be one to guarantee
the forward progress.

I took inspiration from your patch. This is what comes to my mind
on top of the current master (printing only running workers):

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index aeaec79bc09c..a044c7e42139 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7588,12 +7588,15 @@ static void show_cpu_pool_hog(struct worker_pool *pool)
{
struct worker *worker;
unsigned long irq_flags;
+ bool found_running;
int bkt;

raw_spin_lock_irqsave(&pool->lock, irq_flags);

+ found_running = false;
hash_for_each(pool->busy_hash, bkt, worker, hentry) {
if (task_is_running(worker->task)) {
+ found_running = true;
/*
* Defer printing to avoid deadlocks in console
* drivers that queue work while holding locks
@@ -7609,6 +7612,19 @@ static void show_cpu_pool_hog(struct worker_pool *pool)
}

raw_spin_unlock_irqrestore(&pool->lock, irq_flags);
+
+ if (!found_running) {
+ pr_info("pool %d: no worker in running state, cpu=%d is %s (nr_workers=%d nr_idle=%d)\n",
+ pool->id, pool->cpu,
+ idle_cpu(pool->cpu) ? "idle" : "busy",
+ pool->nr_workers, pool->nr_idle);
+ pr_info("The pool might have troubles to wake up another idle worker.\n");
+ if (pool->manager) {
+ pr_info("Backtrace of the pool manager:\n");
+ sched_show_task(pool->manager->task);
+ }
+ trigger_single_cpu_backtrace(pool->cpu);
+ }
}

static void show_cpu_pools_hogs(void)

Warning: The code is not safe. We would need add some synchronization
of the pool->manager pointer.

Even better might be to print state and backtrace of the process
which was woken by kick_pool() when the last running worker
went asleep.

Motivation: AFAIK, if there is a pending work in CPU bound workqueue
than at least one worker in the related worker pool should be
in "task_is_running()" state to guarantee forward progress.

If we find the running worker then it will likely be the
culprit. It either runs for too long. Or it is the last
idle worker and it fails to create a new one.

If there is no worker in running state then there is likely
a problem in the core workqueue code. Or some work shoot
the workqueue into its leg. Anyway, we might need to print
much more details to nail it down.

Best Regards,
Petr