Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state
From: Petr Mladek
Date: Fri Jan 30 2026 - 11:28:37 EST
On Fri 2026-01-30 16:56:56, Petr Mladek wrote:
> On Thu 2026-01-29 10:34:53, ysard wrote:
> > Summary
> > =======
> >
> > The patch works only when I *uncomment* the 2 `synchronize_srcu(&console_srcu);` lines!
>
> > Works now!:
> > $ sudo sh -c "
> > mkdir -p /var/run/nvidia-sleep \
> > && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
> > && chvt 63 \
> > && echo suspend >/proc/driver/nvidia/suspend \
> > && systemctl suspend"
> >
> > Logs:
> > [ 338.901995] [ T3134] printk: Suspending console(s) (use no_console_suspend to debug)
> > [ 338.901997] [ T3134] printk: console_suspend_all
> > [ 338.932763] [ T2672] printk: console_trylock
> > [ 338.948664] [ T2659] printk: console_trylock
> > [ 338.948685] [ T2671] printk: console_trylock
> > [ 338.950716] [ T272] printk: console_trylock
> > [ 338.950747] [ T270] printk: console_trylock
> > [ 338.982194] [ T3134] printk: console_trylock
> > [ 339.020910] [ T3134] printk: console_trylock
> > [ 339.044613] [ T158] printk: console_trylock
> > [ 339.132842] [ T3134] printk: console_trylock
[...]
> > [ 339.614444] [ T9] printk: console_trylock
> > [ 339.634304] [ T3149] printk: console_trylock
> > [ 339.942244] [ T2673] printk: console_trylock
> > [ 340.123772] [ T3134] printk: console_resume_all
> >
> > Proc comm values:
> > 1127: kworker/3:2-cgroup_offline
> > 158: kworker/5:1-mm_percpu_wq
> > 2659: kworker/u32:17-async
> > 2661: kworker/u32:19-async
> > 2671: kworker/u32:29-async
> > 2672: kworker/u32:30-async
> > 2673: kworker/u32:31-kvfree_rcu_reclaim
> > 270: scsi_eh_1
> > 272: scsi_eh_2
> > 310: kworker/2:2-events
> > 3134: not exists or not readable
> > 3149: kworker/u32:38-flush-253:3
> > 65: kworker/u32:2-async
> > 66: kworker/u32:3-async
> > 67: kworker/u32:4-async
> > 9: kworker/0:0-events
>
> It is hard to know what is going there. I guess that many
> console_trylock() calls are from printk(). But they might also
> be from tty or from the nvidia driver code.
>
> I have tried to create a patch which would print backtraces
> of the callers. The output might be interesting. I am going
> to send it in a separate mail.
Please, find the patch below. It can be applied on top of
the preovius one which reverted the problematic commit,
see https://lore.kernel.org/all/aXoWiJhcOaGGlcmk@xxxxxxxxxxxxxxx/
Important: You still need to explicitely uncomment the
sychronize_srcu() calls.
It would be nice to provide three logs with this patch:
1. console_lock() API called by
echo suspend >/proc/driver/nvidia/suspend
2. console_lock() API called by the suspend test
No freeze (expected):
$ sudo sh -c "
mkdir -p /var/run/nvidia-sleep \
&& echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
&& chvt 63 \
&& systemctl suspend"
3. No freeze with the 1st patch and uncommented synchronize_srcu() calls.:
$ sudo sh -c "
mkdir -p /var/run/nvidia-sleep \
&& echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
&& chvt 63 \
&& echo suspend >/proc/driver/nvidia/suspend \
&& systemctl suspend"
Feel free to do your own tests. But please do not spend too much
time on it.
It would be nice to find the culprit. Especially I would like to
know whether the problem is in the core kernel code (printk,
suspend, tty) or in the nvidia driver.
But we might also just need to restore the original behavior
of the console lock API during suspend. I mean the short cuts.
They are a kind of optimization anyway.
But as I said, it would be nice to understand the problem. There
might be a race or infinite loop somewhere and the original
console_lock API behavior just hides the problem.
OK, here is the patch which printed backtraces of console_lock
API callers around the suspend for me.