RE: Problems with EDAC coexisting with BIOS

From: Ong, Soo Keong
Date: Mon Apr 24 2006 - 09:59:54 EST


Alan,

Have you understood how the errors are connected to the interrupts
(either SMI, NMI, SCI)?

When does EDAC read the error status? Periodical or upon interrpt by
errors?

Soo Keong


-----Original Message-----
From: Alan Cox [mailto:alan@xxxxxxxxxxxxxxxxxxx]
Sent: Monday, April 24, 2006 9:19 PM
To: Gross, Mark
Cc: bluesmoke-devel@xxxxxxxxxxxxxxxxxxxxx; LKML; Carbonari, Steven; Ong,
Soo Keong; Wang, Zhenyu Z
Subject: Re: Problems with EDAC coexisting with BIOS

On Gwe, 2006-04-21 at 09:01 -0700, Gross, Mark wrote:
> 1) The default AMI BIOS behavior on SMI is to check the chipset error
> registers (Dev0:Fun1) and re-hide them.

The words "bad design" come to mind (followed by a large number of more
accurate phrases that are inappropriate for a public list)

> Basically if device 0 : function 1 is hidden by the platform at boot
> time un-hiding and using the device and function is a risky thing to
do,

Intel provided patches that do exactly this for some of the chip
workarounds. Are you saying the Intel chip work around also needs
fixing ?

> The driver should never get loaded by default or automatically. If
the
> user knows enough about there BIOS to trust that the SMI behavior will
> coexist with the driver then its OK to load otherwise using this
driver
> is not a safe thing to do.

So Intel and/or the BIOS vendors also forgot to put in any kind of
indicator ? How do they expect end users to know this, or OS vendors ?
Is there a technote that covers this mess ?

> I think the best thing to do is to have the driver error out in its
init
> or probe code if the dev0:fun1 is hidden at boot time.
>
> Comments?

Why did Intel bother implementing this functionality and then screwing
it up so that OS vendors can't use it ? It seems so bogus.

At the very least we should print a warning advising the user that the
BIOS is incompatible and to ask the BIOS vendor for an update so that
they can enable error detection and management support.

Is only the AMI BIOS this braindamaged, should we just blacklist AMI
bioses in EDAC or is this shared Intel supplied code that may be found
in other vendors systems.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/