Re: 3.16-rcX crashes on resume from Suspend-To-RAM

From: Zhang Rui
Date: Sun Aug 17 2014 - 21:11:55 EST


On Sat, 2014-08-16 at 02:46 +0200, Rafael J. Wysocki wrote:
> On Friday, August 15, 2014 10:17:42 AM Markus Gutschke wrote:
> > Just wondering if any of you had any other ideas of what I could try
> > to help debug this problem?
>
> My theory is that there is a device in your system that we don't have a driver
> for, but it had been enumerated as a PNP device before the change that triggered
> the problem for you and we turned it off during suspend as part of the default
> ACPI PNP device handling.

I had the same assumption before, thus I checked the difference of
platform devices and pnp devices, and found that there are three devices
enumerated to platform bus instead of PNP bus after the ACPI enumeration
rework patches. They are PNP0800, PNP0200 and PNP0C04 devices, thus I
made a debug patch to add those ids to the acpi_pnp scan handler id list
so that they will stay in PNP bus. But the problem still exists after
applying the debug patch.
>
> The reason why you're seeing a crash with the "platform" test level is most
> likely that the _WAK control method does something unusual on your system.
>
an easy way to check this is to apply the debug patch attached and
re-test "platform" test level.

thanks,
rui
> The LNXSYBUS:00 thing from dmesg probably is a red herring.
>
> I need the output of acpidump from the affected system, but please attach it
> to the bug entry at https://bugzilla.kernel.org/show_bug.cgi?id=80911 that
> Rui has created for this issue.
>
> Also please check the list of PNP devices under
>
> /sys/bus/pnp/devices/
>
> before and after the commit you have found by bisection and let me know if
> there are any differences.
>
>
> > On Tue, Aug 12, 2014 at 9:11 AM, Markus Gutschke <markus@xxxxxxxxxxxx> wrote:
> > > As I said earlier in this thread, echo'ing "devices" into "pm_test"
> > > does not result in a crash; but doing so for "platform" does.
> > >
> > > Markus
> > >
> > > On Aug 12, 2014 1:26 AM, "Zhang Rui" <rui.zhang@xxxxxxxxx> wrote:
> > >>
> > >> On Sat, 2014-08-09 at 03:14 -0700, Markus Gutschke wrote:
> > >> > I am back and have physical access to the machine now.
> > >> >
> > >> great!
> > >>
> > >> > I re-ran the test just to be sure, and I can confirm that "platform"
> > >> > does in fact result in a crash.
> > >> >
> > >> what about "devices"?
> > >> I mean
> > >>
> > >> # echo devices > /sys/power/pm_test
> > >>
> > >> and see if that triggers the crash.
> > >>
> > >> > Furthermore, I ran the test that Rui asked for. I suspended, resumed,
> > >> > and upon crashing power-cycled the machine ASAP. "dmesg" suggests that
> > >> > the problem is with LNXSYBUS:00 That doesn't tell me much, but
> > >> > hopefully it makes sense to you guys.
> > >> >
> > >> [ 0.930093] Magic number: 10:810:122
> > >> [ 0.930185] acpi LNXSYBUS:00: hash matches
> > >>
> > >> This looks weird, ACPI will do nothing for LNXSYBUS devices during
> > >> resume.
> > >> Rafael, any thought on this?
> > >>
> > >> thanks,
> > >> rui
> > >>
>