Re: Problems with EDAC coexisting with BIOS

From: Tim Small
Date: Thu May 04 2006 - 05:44:19 EST


thockin@xxxxxxxxxx wrote:

On Wed, May 03, 2006 at 09:25:01PM +0100, Tim Small wrote:


existing BIOSs, but the EDAC module could reprogram the chipset error-signalling registers, so that an ECC error no longer triggers an


This is key, I think.



SMI. The BIOS SMI handler could then read the signalling registers, and leave the ECC registers well alone if ECC errors are not set to generate an SMI.



The fundamental problem with SMI is that we CAN'T know what it is doing.
I've seen systems which trigger SMI from a GPIO toggled by a clock. I've
seen systems trigger SMI from a chipset-internal periodic timer. I've
seen chipsets route NMI->SMI or even MCE->SMI. If the BIOS is polling the
error status registers from a periodic SMI, we're GOING to lose data.

The big hammer - turn off SMI - is probably OK on some systems, but is not
a general solution. More and more hardware workarounds and features are
SMI based. There are some rather interesting things that can be done in
SMM, *iff* we could get the BIOS out of the way.


Agreed - I have had experience of a system (Intel 855GME chipset based, AMI BIOS) which emulates the i8042 in the BIOS at SMI time. Mmm nice. When the Linux i8042 driver can polled the (pretend) i8042, the system spent ages in the BIOS, and general interrupt latency on the system fell apart... Oh what a mess.

A limited solution is probably to modify the existing EDAC drivers so that they ensure SMI generation is disabled (for the specific errors that the EDAC drivers are designed to handle). The OS is then at least doing the right thing, even if the BIOS isn't... This should improve upon the current behaviour on some systems, and shouldn't (as far as I can see) break any others. The EDAC code also probably needs to be toughened up (at least on some chipsets) so that it doesn't fall over when the BIOS steps on its toes.

Tim.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/