Re: [PATCH] PCI: PM: Avoid possible suspend-to-idle issue

From: Rafael J. Wysocki
Date: Tue Jun 11 2019 - 05:08:33 EST


On Tue, Jun 11, 2019 at 10:39 AM Kai-Heng Feng
<kai.heng.feng@xxxxxxxxxxxxx> wrote:
>
> Hi Rafael,
>
> at 19:02, Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote:
>
> > On Friday, May 17, 2019 11:08:50 AM CEST Rafael J. Wysocki wrote:
> >> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >>
> >> If a PCI driver leaves the device handled by it in D0 and calls
> >> pci_save_state() on the device in its ->suspend() or ->suspend_late()
> >> callback, it can expect the device to stay in D0 over the whole
> >> s2idle cycle. However, that may not be the case if there is a
> >> spurious wakeup while the system is suspended, because in that case
> >> pci_pm_suspend_noirq() will run again after pci_pm_resume_noirq()
> >> which calls pci_restore_state(), via pci_pm_default_resume_early(),
> >> so state_saved is cleared and the second iteration of
> >> pci_pm_suspend_noirq() will invoke pci_prepare_to_sleep() which
> >> may change the power state of the device.
> >>
> >> To avoid that, add a new internal flag, skip_bus_pm, that will be set
> >> by pci_pm_suspend_noirq() when it runs for the first time during the
> >> given system suspend-resume cycle if the state of the device has
> >> been saved already and the device is still in D0. Setting that flag
> >> will cause the next iterations of pci_pm_suspend_noirq() to set
> >> state_saved for pci_pm_resume_noirq(), so that it always restores the
> >> device state from the originally saved data, and avoid calling
> >> pci_prepare_to_sleep() for the device.
> >>
> >> Fixes: 33e4f80ee69b ("ACPI / PM: Ignore spurious SCI wakeups from
> >> suspend-to-idle")
> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>
> I just found out this patch has a chance to freeze or reboot the system
> during suspend cycles.
> What information do you need to debug?

It would be good to narrow down the failure to a particular
transition, for example.

In particular, if that happens during the dpm_noirq_resume_devices()
called from s2idle_loop(), it may be necessary to also skip
pci_pm_default_resume_early() for the devices with skip_bus_pm set.

How many devices on the affected system end up with skip_bus_pm set,
for that matter?