Re: [PATCH v2 4/5] workqueue: Show all busy workers in stall diagnostics
From: Jiri Slaby
Date: Mon May 11 2026 - 01:22:51 EST
Hi,
we currently have several reports of this. On s390, ppc64, and x86_64.
On 07. 05. 26, 15:11, Breno Leitao wrote:
Hi Jiri,
On Thu, May 07, 2026 at 12:20:33PM +0200, Jiri Slaby wrote:
On 05. 03. 26, 17:15, Breno Leitao wrote:
BUG: workqueue lockup - pool cpus=144 node=0 flags=0x4 nice=0 stuck for
168224s!
That's an extremely long stall (~1.95 days).
...
Showing busy workqueues and worker pools:
workqueue rcu_gp: flags=0x108
pwq 578: cpus=144 node=0 flags=0x4 nice=0 active=3 refcnt=4
in:
https://bugzilla.suse.com/show_bug.cgi?id=1263947
?
Can this (or other patch from the series) cause this? Should there be
something like cpu_online() instead of task_is_running() somewhere?
This series only affects stall reporting, not detection. The changes run
after the watchdog has identified a stall, so the detection logic itself
remains unchanged.
To help diagnose this issue, could you provide some additional information:
1) Was CPU 144 online at any point? If so, when was it taken offline?
It was not, it's non-present.
2) Does this message appear repeatedly? If you bring CPU 144 online, does
the issue resolve?
Yes, look at this new x86_64 report's dmesg (I believe it is related to the above report):
BUG: workqueue lockup - pool cpus=2 node=0 flags=0x4 nice=0 stuck for 50s!
in:
https://bugzilla.suse.com/attachment.cgi?id=890229
$ grep -c BUG sl.txt
504
$ grep -c pwq sl.txt
509
It comes from:
https://bugzilla.suse.com/show_bug.cgi?id=1264554
3) Have you run similar tests on earlier kernel versions without seeing
this behavior, or is this a clear regression?
It's new in 7.0. Going back to 6.19.12 makes it disappear.
thanks,
--
js
suse labs