Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state

From: Petr Mladek

Date: Wed Jan 28 2026 - 09:07:23 EST


On Sat 2026-01-24 02:22:41, ysard wrote:
> On Fri 2026-01-23 13:19:34 +0100, Petr Mladek wrote:
> > Also I would expect that the userspace waits until the services
> > finish the job before suspending the kernel.
>
> It does:
>
> janv. 24 00:33:41 systemd[1]: Reached target sleep.target - Sleep.
> janv. 24 00:33:41 systemd[1]: Starting nvidia-suspend.service - NVIDIA system suspend actions...
> janv. 24 00:33:41 suspend[51525]: nvidia-suspend.service
> janv. 24 00:33:41 logger[51525]: <13>Jan 24 00:33:41 suspend: nvidia-suspend.service
> janv. 24 00:33:42 kernel: audit: type=1400 audit(1769211222.373:2351): apparmor="ALLOWED" operation="open" class="file" profile="Xorg" name="/dev/nvidiactl" pid=1441 comm="Xorg" requested_mask="wr" denied_mask="wr" fsuid=0 ouid=0
> janv. 24 00:33:42 kernel: audit: type=1400 audit(1769211222.969:2352): apparmor="ALLOWED" operation="open" class="file" profile="Xorg" name="/dev/nvidiactl" pid=1441 comm="Xorg" requested_mask="wr" denied_mask="wr" fsuid=0 ouid=0
> janv. 24 00:33:45 systemd[1]: nvidia-suspend.service: Deactivated successfully.
> janv. 24 00:33:45 systemd[1]: Finished nvidia-suspend.service - NVIDIA system suspend actions.
> janv. 24 00:33:45 systemd[1]: Starting systemd-suspend.service - System Suspend...
> janv. 24 00:33:45 systemd[1]: session-1.scope: Unit now frozen-by-parent.
> janv. 24 00:33:45 systemd[1]: user@1000.service: Unit now frozen-by-parent.
> janv. 24 00:33:45 systemd[1]: user-1000.slice: Unit now frozen-by-parent.
> janv. 24 00:33:45 systemd[1]: user.slice: Unit now frozen.
> janv. 24 00:33:45 systemd-sleep[51562]: Successfully froze unit 'user.slice'.
> janv. 24 00:33:45 systemd-sleep[51562]: Performing sleep operation 'suspend'...
> janv. 24 00:33:45 kernel: PM: suspend entry (deep)

OK.

> Yes I have a reproducible pattern here. With the service disabled.
> The service `nvidia-resume.service` (which basically calls the script
> with the 'resume' argument) is expected to start if the resume is
> completed, but the system does not reach this stage during the freeze.
>
> No freeze:
> $ sudo sh -c "
> mkdir -p /var/run/nvidia-sleep \
> && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
> && chvt 63 \
> && systemctl suspend"
>
> Freeze:
> $ sudo sh -c "
> mkdir -p /var/run/nvidia-sleep \
> && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
> && chvt 63 \
> && echo suspend >/proc/driver/nvidia/suspend \
> && systemctl suspend"
>
> So the problem is related to this command:
> $ echo suspend >/proc/driver/nvidia/suspend
>
> Note that without the systemctl order this command suspends and wakes up the gpu correctly:
> $ sudo sh -c "
> chvt 63 \
> && echo suspend >/proc/driver/nvidia/suspend; \
> sleep 4; \
> echo resume >/proc/driver/nvidia/suspend; \
> chvt 2"

Interesting. It looks like the nvidia suspend does something which
breaks the system suspend. But the driver is able to revert it...

To be honest, I do not have any theory which could explain this.

But I have found a bug in John's debug patch from
https://lore.kernel.org/all/877bts1ltv.fsf@xxxxxxxxxxxxxxxxxxxxx/

The patch tried to restore the original behavior on current mainline.
But console_suspend()/cosnole_resume() function have been renamed recently
to console_suspend_all()/console_resume_all(). The original
names were used for console-specific suspend/resume variants,
see
https://lore.kernel.org/all/20250226-printk-renaming-v1-0-0b878577f2e6@xxxxxxxx/

Also the debug patch did not revert synchronize_srcu(). I guess that
this was intentional. But I would rather revert it as well because
it is a potentially blocking operation.

Could you please test it with this fixed version of the debug patch?

If the patch helps, by chance, then please try to uncomment
the synchronize_srcu() calls and check if it still works.
I wonder if they make in difference.