Re: Workqueue regression

From: Konrad Dybcio
Date: Fri Feb 02 2024 - 07:32:31 EST


On 2.02.2024 02:52, Tejun Heo wrote:
> Hello,
>
> On Thu, Feb 01, 2024 at 09:57:59PM +0100, Konrad Dybcio wrote:
>> So, commit "Implement system-wide nr_active enforcement for unbound workqueues"
>> broke *something* and now performing a suspend-wakeup cycle on a Qualcomm
>> SC8280XP-based (arm64) platform hangs when performing the resume tasks,
>> presumably somewhere near PCIe reinitialization (but that may be a red herring).
>>
>> Reverting the commit (and the ones on top of it due to conflicts) fixes
>> the issue on next-20240130 and later (plus some out-of-tree patches that
>> are largely unrelated).
>>
>> Not sure where to start looking.
>
> Hmm... sorry about that. Can you please boot with `console_no_suspend` and
> retry? Once the system gets stuck, you can wait for several minutes till the
> workqueue watchdog triggers and dumps the state or, if you can, trigger
> `sysrq-t` which has workqueue state dump at the end.
>
> If the system doesn't become live enough after suspend/resume cycle to get
> more info, the following might help:

Looks like it's too far gone indeed..

>
> $ echo test_resume > /sys/power/disk
> $ echo disk > /sys/power/state

Sadly, hibernation is not a thing on this platform.. Without going into much
detail of how messy the power management stuff is, you can either have
"on", "off" or "power collapsed" (bound to s2idle).. Trying to trigger this
sequence makes the thing lock up and die due to unclocked accesses with or
without the WQ regression.

Konrad