Re: 2.6.30-git(16 and 17) system hangs after resume from suspendto disk, mce related?
From: Andi Kleen
Date: Mon Jun 22 2009 - 02:43:46 EST
Hidetoshi Seto wrote:
Maciej Rutecki wrote:
Also a "a few minutes" suggest something might be going wrong
with the poll handler. Does the problem still happen
with you use CONFIG_X86_NEW_MCE again, but before
resume do
echo 0 > /sys/device/system/machinecheck/machinecheck0/check_interval
On the other hand you should get a crash very fast with
echo 1 > /sys/device/system/machinecheck/machinecheck0/check_interval
I didn't instructions from above, but I found something else. After
normal boot I try:
echo 1 > /sys/devices/system/machinecheck/machinecheck0/check_interval
I I found this in dmesg:
[ 141.704025] ------------[ cut here ]------------
[ 141.704039] WARNING: at arch/x86/kernel/cpu/mcheck/mce.c:1102
mcheck_timer+0xf5/0x100()
I see. At least this warning will be cleared by following patch.
WARN_ON(smp_processor_id() != data);
But I'm not sure whether this can cause system hangs or not.
It might actually. If two different handlers run on the same CPU
they could re-add a timer twice, which might cause loops in the timer
list etc.
Maciej, can you test Seto-san's patch please?
BTW this is probably related to
commit eea08f32adb3f97553d49a4f79a119833036000a
Author: Arun R Bharadwaj <arun@xxxxxxxxxxxxxxxxxx>
Date: Thu Apr 16 12:16:41 2009 +0530
timers: Logic to move non pinned timers
it might be also useful to test if reverting that patch makes
the problem go away. But with this patch we need the add_timer_on change.
-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/