Re: [PATCH] 2.6.xx: sata_mv: another critical fix

From: Sander
Date: Tue Mar 21 2006 - 14:13:39 EST


Linus Torvalds wrote (ao):
> On Tue, 21 Mar 2006, Sander wrote:
> > The system just freezes. Rock solid. No sysrq, no ctrl-alt-del, nothing.
>
> Can you enable the NMI watchdog? It could be a PCI bus lockup (in which
> case nothing will help), but if it's some interrupts-off busy loop
> (whether due to a spinlock deadlock or due to the driver just spinning)
> then nmi-watchdog should help.
>
> Of course, that requires that you have support for local/io-APIC (ie if
> UP, please select CONFIG_X86_UP_.*APIC)

The kernel is compiled for x86-64 and SMP (dual core opteron), so if I
understand the NMI watchdog documentation correctly, it is automagically
enabled.

# dmesg | grep -i nmi
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
[ 75.280604] testing NMI watchdog ... OK.

# grep -i nmi /proc/interrupts
NMI: 52 43

(seems to increment _very_ slowly).

Is there anything else I can do to see some crash info?

Btw, it always seems to crash during the md5sum of this test:

for i in `seq 4`
do dd if=/dev/zero of=bigfile.$i bs=1024k count=10000
dd if=bigfile.$i of=/dev/null bs=1024k count=10000
done
time md5sum bigfile.*
time rm bigfile.*

One time during many tests I needed to run this twice before it went
bellyup.

I was not able to let 2.6.16-rc6-mm2 crash yet.

I'll test 2.6.16-rc6-mm1 now.

--
Humilis IT Services and Solutions
http://www.humilis.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/