Re: Kernel lockups on dual-Athlon board -- help wanted

From: Eric W. Biederman (ebiederm@xmission.com)
Date: Sat Aug 11 2001 - 11:17:38 EST


"Eric S. Raymond" <esr@thyrsus.com> writes:

> Gary Sandine of Los Alamos Computers and I are attempting to qualify
> Linux on a Tyan 2462 K7 Thunder motherboard -- dual Athlon 1200 MP
> chips supported by an AMD 760 chipset. We have been seeing mysterious
> lockups during commands to build things from source, like kernels and X.
>
> We've been trying to track down the problem for about sixteen hours
> and have gathered quite a bit of data, but don't have a theory to explain
> it.

What kind of case are you running in? I have heard of one other case
that sounds similiar and in that case the system was in a 1U.
 
> First, we have established that this is a real kernel hang, not just a
> bad device state:
>
> A. Lockups can be induced in either console or X mode. A reliable way to
> induce them is to run `make clean' on an X tree (any sufficiently
> long-running command seems to do it).
>
> B. We logged in over the network, started a top(1) in the network
> session, induced the hang on the console, and watch top(1) freeze.
> So
>
> C. The magic AltSysRq command is ineffective when the lockups happen.
>
> Here's what we know about it:
>
> 1. Lockups never occur under a uniprocessor kernel.
>
> 2. Configuring APM and ACPI out of the kernel does not prevent the lockups.
> Disabling ACPI and power management doesn't stop them either.
>
> 3. Changing kernels from 2.4.3 to 2.4.7 doesn't prevent the lockups.
>
> 4. The SMP kernel built for either PII or AMD (no APM, no ACPI) locks up.
>
> 5. There is an undocumented BIOS setting "Use PCI Interrupt Entries in
> MP table." By default it is on. Turning it off doesn't prevent the
> lockups.

This switches between listing the 4 interrupts that the board uses for pci
between either in the ISA range if interrupts or routing them to the IOAPIC
above the normal 16 ISA interrupts.
 
> 6. Here's a weird one. When the kernel is running, the power switch
> has to be pressed down for 4 seconds to power down the machine. But
> during a lockup it powers down the machine instantly.
>
> What we're seeing suggests some bad interaction between the SMP
> support and the hardware. But item 7 hints that power management
> could be involved, even though we have it configured out.

The board only uses ACPI so power management isn't a large canidate.

I think I have to go with Alan that the most likely case is that the
board is marginal in respect.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Aug 15 2001 - 21:00:39 EST