Machine crashes right *after* ~successful resume
From: Wilmer van der Gaast
Date: Tue Oct 07 2014 - 19:32:07 EST
Hello,
Rafael, including you on this since
http://linuxconcloudopenna2013.sched.org/event/d708f47d07cd44b9669610778c024708#.VDRzTDS_EUF
mentions you as the maintainer for Linux + power management. I hope this
is still accurate.
Since Linux 3.12 (Debian version 3.12.9-1~bpo70+1) and all the way up to
3.16 (Debian version 3.16.3-2), I'm having suspend-resume issues on my
machine (Intel Z68, i7-3770K) that are somewhat less obvious.
After every boot, I get two successful suspend+resume cycles, but after
the third suspend, it won't resume successfully. On the VGA console I've
never had anything useful logged, luckily over the serial console I've
had more luck. I seem to get as far as:
[ 153.787678] PM: resume of devices complete after 3797.737 msecs
[ 153.787775] PM: resume devices took 3.796 seconds
[ 154.238612] Restarting tasks ... done.
And indeed, while testing I was running a "ping -i0.01" to a host on my
network, and it managed to get a few packets out. Timing already seems
quite off though:
22:11:49.515489 IP 192.168.44.101 > 192.168.44.100: ICMP echo request,
id 3074, seq 894, length 64
22:11:49.982265 IP 192.168.44.101 > 192.168.44.100: ICMP echo request,
id 3074, seq 895, length 64
22:11:50.986779 IP 192.168.44.101 > 192.168.44.100: ICMP echo request,
id 3074, seq 896, length 64
Note the gaps that are 0.4-1.0s instead of the 0.01s they should've
been. To me these pings going *out* sound like userland's definitely
waking up for a while, or at least some processes are. Also, for several
seconds even during earlier stages of the resume, the machine is already
responding to echo requests.
Sadly after this message to my serial console and these few ICMP
packets, the machine locks up quite hard, to the point that SysRq
doesn't respond anymore. :-(
This is happening for a while already and makes suspend+resume mostly
useless on my machine. What other debugging info can I provide to help
with getting this fixed?
I've found out about pm_trace, which always points at the same line (and
no device):
/var/log/syslog.1:Oct 10 16:43:58 ruby kernel: [ 0.780503] Magic
number: 0:52:740
/var/log/syslog.1:Oct 10 16:43:58 ruby kernel: [ 0.780599] hash
matches /tmp/linux-3.16.3/drivers/base/power/main.c:812
In my source tree that line is:
TRACE_RESUME(error);
Right at the end of device_resume(), under the Complete: label. Note
that I might have to redo this though, as I now realise I had only
recompiled my *kernel* with the PM_TRACE_RTC flag set, not all my
modules, which I assume is not enough. (I'm thinking of filing a Debian
bug requesting this flag to be enabled by default..) However since the
kernel seems to declare the resume as complete I'm not sure whether
pm_trace is still of any use?
With kernels 3.10 and older I have no such problems, I can
suspend+resume as often as I want.
I've already tried to skip the NVidia + VMware modules at boot time (as
you can see from the logs they're not loaded at any point), but it
didn't help. I could try omitting more modules.
I'm attaching a full dmesg of boot + a few suspend+resume cycles in 3.10
and 3.16, and a dump of the serial console showing the last resume cycle
(which I couldn't get from dmesg of course).
You might notice the message about s2ram segfaulting which I've looked
at, that seems to be VBE-related code, but this problem occurs even when
I just echo ram to /sys/power/state directly without using s2ram, so I
assume it's not related.
Sorry for the long message. I'd love some ideas for troubleshooting an
issue like this.
"Attachments" in http://roy.gaast.net/~wilmer/.lkml/ since I just
realised >200KB of attachments might not be appreciated. :-)
Cheers,
Wilmer van der Gaast.
--
+-------- .''`. - -- ---+ + - -- --- ---- ----- ------+
| wilmer : :' : gaast.net | | OSS Programmer www.bitlbee.org |
| lintux `. `~' debian.org | | Full-time geek wilmer.gaast.net |
+--- -- - ` ---------------+ +------ ----- ---- --- -- - +
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/