Re: [PATCH v2] net: wwan: iosm: Fix hibernation by re-binding the driver around it

From: Maciej S. Szmigiero
Date: Wed Jan 08 2025 - 15:18:40 EST


On 8.01.2025 21:17, Rafael J. Wysocki wrote:
On Wed, Jan 8, 2025 at 9:04 PM Maciej S. Szmigiero
<mail@xxxxxxxxxxxxxxxxxxxxx> wrote:

On 8.01.2025 20:51, Bjorn Helgaas wrote:
[+cc Rafael, linux-pm because they *are* PM experts :)]

On Wed, Jan 08, 2025 at 02:15:28AM +0200, Sergey Ryazanov wrote:
On 08.01.2025 01:45, Bjorn Helgaas wrote:
On Wed, Jan 08, 2025 at 01:13:41AM +0200, Sergey Ryazanov wrote:
On 05.01.2025 19:39, Maciej S. Szmigiero wrote:
Currently, the driver is seriously broken with respect to the
hibernation (S4): after image restore the device is back into
IPC_MEM_EXEC_STAGE_BOOT (which AFAIK means bootloader stage) and needs
full re-launch of the rest of its firmware, but the driver restore
handler treats the device as merely sleeping and just sends it a
wake-up command.

This wake-up command times out but device nodes (/dev/wwan*) remain
accessible.
However attempting to use them causes the bootloader to crash and
enter IPC_MEM_EXEC_STAGE_CD_READY stage (which apparently means "a crash
dump is ready").

It seems that the device cannot be re-initialized from this crashed
stage without toggling some reset pin (on my test platform that's
apparently what the device _RST ACPI method does).

While it would theoretically be possible to rewrite the driver to tear
down the whole MUX / IPC layers on hibernation (so the bootloader does
not crash from improper access) and then re-launch the device on
restore this would require significant refactoring of the driver
(believe me, I've tried), since there are quite a few assumptions
hard-coded in the driver about the device never being partially
de-initialized (like channels other than devlink cannot be closed,
for example).
Probably this would also need some programming guide for this hardware.

Considering that the driver seems orphaned [1] and other people are
hitting this issue too [2] fix it by simply unbinding the PCI driver
before hibernation and re-binding it after restore, much like
USB_QUIRK_RESET_RESUME does for USB devices that exhibit a similar
problem.

Tested on XMM7360 in HP EliteBook 855 G7 both with s2idle (which uses
the existing suspend / resume handlers) and S4 (which uses the new code).

[1]: https://lore.kernel.org/all/c248f0b4-2114-4c61-905f-466a786bdebb@xxxxxxxxxxxxx/
[2]:
https://github.com/xmm7360/xmm7360-pci/issues/211#issuecomment-1804139413

Signed-off-by: Maciej S. Szmigiero <mail@xxxxxxxxxxxxxxxxxxxxx>

Generally looks good to me. Lets wait for approval from PCI
maintainers to be sure that there no unexpected side effects.

I have nothing useful to contribute here. Seems like kind of a
mess. But Intel claims to maintain this, so it would be nice if
they would step up and make this work nicely.

Suddenly, Intel lost their interest in the modems market and, as
Maciej mentioned, the driver was abandon for a quite time now. The
author no more works for Intel. You will see the bounce.

Well, that's unfortunate :) Maybe step 0 is to remove the Intel
entry from MAINTAINERS for this driver.

Bjorn, could you suggest how to deal easily with the device that is
incapable to seamlessly recover from hibernation? I am totally
hopeless regarding the PM topic. Or is the deep driver rework the
only option?

I'm pretty PM-illiterate myself. Based on
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/pm/sleep-states.rst?id=v6.12#n109,
I assume that when we resume after hibernate, devices are in the same
state as after a fresh boot, i.e., the state driver .probe() methods
see.

So I assume that some combination of dev_pm_ops methods must be able
to do basically the same as .probe() to get the device usable again
after it was completely powered off and back on.

You are right that it should be theoretically possible to fix this issue
by re-initializing the driver in the hibernation restore/thaw callbacks
and I even have tried to do so in the beginning.

But as I wrote in this patch description, doing so would need significant
refactoring of the driver as it is not currently capable of being
de-initialized and re-initialized partially.

Hence this patch approach of simply re-binding the driver which also
seemed safer in the absence of any real programming docs for this hardware.

While this may not be elegant, it may actually get the job done.

Can you please resend the patch with a CC to linux-pm@xxxxxxxxxxxxxxx?

Will do.

Thanks,
Maciej