Re: PCI / PM: Crashes in PME scan during system suspend
From: Lukas Wunner
Date: Tue Apr 18 2017 - 14:39:34 EST
On Tue, Apr 18, 2017 at 04:06:27PM +0200, Rafael J. Wysocki wrote:
> On Tuesday, April 18, 2017 08:49:39 AM Geert Uytterhoeven wrote:
> > On Sun, Apr 16, 2017 at 9:55 AM, Lukas Wunner <lukas@xxxxxxxxx> wrote:
> > > Subject: [PATCH] PCI: Freeze PME scan before suspending devices
> > >
> > > Laurent Pinchart reported that the Renesas R-Car H2 Lager board
> > > (r8a7790) crashes during suspend tests. Geert Uytterhoeven managed to
> > > reproduce the issue on an M2-W Koelsch board (r8a7791):
> > >
> > > It occurs when the PME scan runs, once per second. During PME scan, the
> > > PCI host bridge (rcar-pci) registers are accessed while its module clock
> > > has already been disabled, leading to the crash.
> > >
> > > The issue only occurs during suspend tests, after writing either
> > > "platform" or "processors" to /sys/power/pm_test. It does not (or is
> > > less likely) to happen during full system suspend ("core" or "none")
> > > because system suspend also disables timers, and thus the workqueue
> > > handling PME scans no longer runs. Geert believes the issue may still
> > > happen in the small window between disabling module clocks and disabling
> > > timers.
> >
> > It can also be reproduced easily by configuring s2ram to use s2idle instead
> > of deep suspend, which is a real usecase:
> >
> > # echo 0 > /sys/module/printk/parameters/console_suspend
> > # echo s2idle > /sys/power/mem_sleep
> > # echo mem > /sys/power/state
> >
> > Tested-by: Geert Uytterhoeven <geert+renesas@xxxxxxxxx>
>
> There is a small concern here that some wakeup events may be missed if they
> are delivered via PME without a working IRQ, but that's fairly minor and it
> cannot be avoided entirely, so
Well, that's a conundrum. I don't know which devices depend on PME polling
and whether they may signal PME between freezing the workqueue and suspending
the host bridge. If this unexpectedly turns out to be a problem in practice,
it might be possible to solve it by calling pci_pme_list_scan() once directly
from one of the host bridge's pm_ops callbacks.
I've amended the commit message with the tags and additional information
provided by you and Geert and will resend the patch to the list shortly.
Thanks,
Lukas