Re: Root filesystem read access for firmware load during hibernation image writing

From: Maciej S. Szmigiero
Date: Sun Oct 06 2024 - 10:03:23 EST


On 5.10.2024 19:40, Pavel Machek wrote:
Hi!

In my case, a USB device (RTL8821CU) gets reset at that stage due to
commit 04b8c8143d46 ("btusb: fix Realtek suspend/resume") and so it tries
to request_firmware() from the root filesystem after that thaw/reset,
when the hibernation image is being written.

It usually succeeds, however often it deadlocks somewhere in Btrfs code
resulting in the system failing to power off after writing the hibernate
image:
power_off() calls dpm_suspend_start(), which calls dpm_prepare(), which
waits for device probe to finish.

And device probe is stuck forever trying to load that USB stick firmware
from the filesystem - so in the end the system never powers off during
(after) hibernation.

That's why I wonder whether this firmware load is supposed to work correctly
during that hibernation state and so the system may be hitting some kind of
a swsusp/btrfs/block layer race condition.

Or, alternatively, maybe reading files is not supported at this point and
so this is really a btrtl/rtw88 bug?

I'd say not supported at this point. Reading file may still read to
atime update, etc, and we can't really can't support that easily.

Thanks for this clarification.

I've dropped btrfs folks from the CC list since this isn't a btrfs issue
after all and added rtw88/btrtl maintainers instead.
Suggestion is to keep firmware cached in memory, or at least cache it
in memory when hibernation begins.

Since a WiFi/BT NIC is hardly useful for hibernation snapshot writing
operation it seems that an easier option would be to simply return
something like -EPROBE_DEFER from both rtw88 and btrtl probe callbacks
during PMSG_THAW hibernation stage.
That -EPROBE_DEFER will hopefully handle the unlikely case that the
hibernation snapshot writing fails or someone is running a
HIBERNATION_TEST_RESUME.

In turn, the easiest trigger for this would be the "in_suspend"
variable being set, however this would require EXPORTing it - it looks
like system_entering_hibernation() only covers the case when the
system is hibernating using platform hibernation support.

I will see whether this workaround works for me, if someone wants
to implement the "firmware caching" approach instead then feel
free to do so.


By the way, I don't see any reason why other USB devices that load
firmware at their probe time can't be affected too since that
lock_device_hotplug() call in hibernate() seems to only prevent
CPU/RAM/ACPI hotplug, not USB hotplug.

So if such USB device happens to get reset during hibernation time
(for example from hub EMI) it would suffer the same issue.

BR,
Pavel


Thanks,
Maciej