Re: Hibernate stuck after recent kernel/workqueue.c changes in Stable 6.6.23

From: Linux regression tracking (Thorsten Leemhuis)
Date: Wed Apr 03 2024 - 00:26:39 EST


On 03.04.24 02:38, Tejun Heo wrote:
> On Tue, Apr 02, 2024 at 10:08:11AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
>> Hi stable team, there is a report that the recent backport of
>> 5797b1c18919cd ("workqueue: Implement system-wide nr_active enforcement
>> for unbound workqueues") [from Tejun] to 6.6.y (as 5a70baec2294) broke
>> hibernate for a user. 6.6.24-rc1 did not fix this problem; reverting the
>> culprit does.
>>
>>> With kernel 6.6.23 hibernating usually hangs here: the display stays
>>> on but the mouse pointer does not move and the keyboard does not work.
>>> But SysRq REISUB does reboot. Sometimes it seems to hibernate: the
>>> computer powers down and can be waked up and the previous display comes
>>> visible, but it is stuck there.
>>
>> See https://bugzilla.kernel.org/show_bug.cgi?id=218658 for details.
>> Note, you have to use bugzilla to reach the reporter, as I sadly[1] can
>> not CCed them in mails like this.
>>
>> Side note: there is a mainline report about problems due to
>> 5797b1c18919cd ("workqueue: Implement system-wide nr_active enforcement
>> for unbound workqueues") as well, but it's about "nohz_full=0 prevents
>> kernel from booting":
>> https://bugzilla.kernel.org/show_bug.cgi?id=218665; will forward that
>> separately to Tejun.
>
> Sorry about late reply

No problem at all, really, thx for your reply!

> but that commit is not for -stable. It does fix an
> undesirable behavior from an earlier commit, so it has the Fixes tag but it
> has a series of dependencies that need to be backported together

Not sure if you know, but the stable team apparently recently backported a
bunch of other workqueue commits as well; I see 10 from a quick look here:
https://lore.kernel.org/all/20240326223803.2647796-1-sashal@xxxxxxxxxx/

$ git log --grep='workqueue:' --oneline v6.6.22..stable/linux-6.6.y -- include/linux/workqueue.h kernel/workqueue*
7df62b8cca38aa workqueue: Don't call cpumask_test_cpu() with -1 CPU in wq_update_node_max_active()
5a70baec2294e8 workqueue: Implement system-wide nr_active enforcement for unbound workqueues
b522229a56941a workqueue: Introduce struct wq_node_nr_active
bd31fb926dfa02 workqueue: RCU protect wq->dfl_pwq and implement accessors for it
5f99fee6f2dea1 workqueue: Make wq_adjust_max_active() round-robin pwqs while activating
4023a2d9507691 workqueue: Move nr_active handling into helpers
6c592f0bb96815 workqueue: Replace pwq_activate_inactive_work() with [__]pwq_activate_work()
bad184d26a4f68 workqueue: Factor out pwq_is_empty()
82e098f5bed1ff workqueue: Move pwq->max_active to wq->max_active
43a181f8f41aca workqueue.c: Increase workqueue name length

And there is also "workqueue: Shorten events_freezable_power_efficient name"
in the queue for the next 6.6.y release:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-6.6

That leads to the question: what is the better patch forward here: pick
up what's is missing or back out?

Side note: I have no idea why the stable team backported those patches
and no option on whether that was wise, just trying to help finding the best
solution forward from the current state of things.

> which would
> be far too invasive for -stable, thus no Cc: stable.
>
> I didn't know a Fixes
> tag automatically triggers backport to -stable. I will keep that in mind for
> future.

/me fears that more and more developers due to situations like this will
avoid Fixes: tags and wonders what consequences that might have for the
kernel as a whole

Ciao, Thorsten