Re: [patch] SMP alternatives
From: Eric W. Biederman
Date: Thu Nov 24 2005 - 10:10:26 EST
Andi Kleen <ak@xxxxxxx> writes:
> On Thu, Nov 24, 2005 at 03:15:24PM +0000, Alan Cox wrote:
>> On Iau, 2005-11-24 at 15:22 +0100, Andi Kleen wrote:
>> > What do you need a special driver for if the northbridge just
>> > can do the scrubbing by itself?
>>
>> You need a driver to collect and report all the ECC single bit errors to
>> the user so that they can decide if they have problem hardware.
>
> Assuming the errors are logged to the standard machine check
> architecture that's already done by mce.c. K8 does that definitely.
>
> Take a look at mcelog at some point.
> Your distro probably already sets it up by default to log to
> /var/log/mcelog
>
>>
>> EDAC is more than one thing
>> - Control response to a fatal error
>> - Report non-fatal events for analysis/user decision making
>
> x86-64 mce.c does all that There was even a port to i386 around at some point.
Right on the k8 memory controller there is a lot of overlap,
with what has already been implemented. For all other x86 memory
controllers the code is filling a large void. The current k8
code has been delayed for this reason.
Where the EDAC code goes beyond the current k8 facilities is the
decode to the dimm level so that the bad memory stick can be
easily identified.
One of the goals of the EDAC code is to work towards a unified
kernel architecture for this kind error reporting. Currently every
architecture (if the error are reported at all) handles this
differently which makes it very hard to do something sane is
user space.
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/