Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")]

From: Rafael J. Wysocki
Date: Tue May 17 2016 - 19:15:04 EST


On 5/16/2016 9:39 PM, Ville Syrjälä wrote:
On Wed, May 11, 2016 at 04:34:06PM +0300, Ville Syrjälä wrote:
On Wed, May 11, 2016 at 08:44:45AM -0400, Steven Rostedt wrote:
On Wed, 11 May 2016 15:21:16 +0300
Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> wrote:

Yeah can't get anything from the machine at that point. netconsole
didn't help either, and no serial on this machine. And IIRC I've
tried ramoops on this thing in the past but unfortunately the memory
got cleared on reboot.

Can you look at the documentation in the kernel code at

Documentation/power/basic-pm-debugging.txt And follow the procedures
for testing suspend to RAM (although it requires mostly running the
same tests as for hibernation suspending).

You can also use the tool s2ram for this as well.

See Documentation/power/s2ram.txt

Perhaps this can give us a bit more light onto the problem.

Basically the above does partial suspend and resume, and can pinpoint
problem areas down to a more select location.
All the pm_test modes work fine. The only difference between them was
that 'platform' required me to manually wake up the machine (hitting a
key was sufficient), whereas the others woke up without help.

pm_trace gave me
[ 1.306633] Magic number: 0:185:178
[ 1.322880] hash matches ../drivers/base/power/main.c:1070
[ 1.339270] acpi device:0e: hash matches
[ 1.355414] platform: hash matches

which is the TRACE_SUSPEND in __device_suspend_noirq(), so no help
there.

I guess I could try to sprinkle more TRACE_RESUMEs around into some
early resume code. If anyone has good ideas where to put them it
might speed things up a bit.
So I did a bunch of that and found that it gets stuck somewhere
around executing the _WAK method:
platform_resume_noirq
acpi_pm_finish
acpi_leave_sleep_state
acpi_hw_sleep_dispatch
acpi_hw_legacy_wake
acpi_hw_execute_sleep_method
acpi_evaluate_object
acpi_ns_evaluate
acpi_ps_execute_method
acpi_ps_parse_aml

It also seesm that adding a few TRACE_RESUME()s or an msleep() right
after enable_nonboot_cpus() can avoid the hang, sometimes.

I've attached the DSDT in case anyone is interested in looking at it.


What if you comment out the execution of _WAK (line 318 of drivers/acpi/acpica/hwsleep.c in 4.6)? Does that make any difference?