Re: [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation

From: Rafael J. Wysocki
Date: Mon Jul 11 2022 - 14:13:36 EST


On Sun, Jul 10, 2022 at 4:25 AM Tetsuo Handa
<penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>
> syzbot is reporting hung task at misc_open() [1], for there is a race
> window of AB-BA deadlock which involves probe_count variable.
>
> Even with "char: misc: allow calling open() callback without misc_mtx
> held" and "PM: hibernate: call wait_for_device_probe() without
> system_transition_mutex held", wait_for_device_probe() from snapshot_open()
> can sleep forever if probe_count cannot become 0.
>
> Since snapshot_open() is a userland-driven hibernation/resume request,
> it should be acceptable to fail if something is wrong.

Not really.

If you are resuming from hibernation and the image cannot be reached
(which is the situation described above), failing and continuing to
boot means discarding the image and possible user data loss.

There is no "graceful failure" in this case.

> Users would not want to wait for hours if device stopped responding.

If the device holding the image is not responding, we should better
wait for it or panic(). Or let the user make the system reboot.