Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree

From: Andi Kleen
Date: Wed Jul 05 2006 - 15:38:15 EST



Ok since you didn't cover it I assume you agree that just using
the address to get the DIMM is sufficient. Thanks.

> Our LinuxBIOS engineers have found that the majority of the DMI/SMBIOS
> tables are incorrect and provide a false sense of security in terms of
> getting the right information that is needed in finding failing devices
> (DIMMs).

Hmm, I found a few outlyers[1] but most systems I checked were
reasonable or had only small problems. I could however not
always verify the mappings by pulling out DIMMs.

Anyways why does LinuxBIOS not just supply a DMI table? Would
seem to me like a vastly more elegant solution than requiring
something in user space to identify the system in other ways.

I don't even want to guess how you identify systems without
a DMI table ...

[1] A few not to be named but well known vendors seem to be too lazy
to set the tables up properly and always mapped all addresses to all DIMMs.
Since it's a serious RAS disadvantage for their systems I suppose
angry customers will sooner or later fix that issue though.


> Our users demand 100% correct DIMM labeling for error fault isolation,
> with minimal manual operation - that is the requirement we are trying
> to satisfy. These items are what lead to the Bluesmoke/EDAC labeling
> solution pattern.

Ok I can see that. But it makes it a very narrow solution because
other people don't know as much about their hardware as you do.

For mainline Linux we should try to focus support on standard mainstream PC
hard&firm&software, not custom systems like you seem to attempt to.

If you find wrong SM tables to be a serious problem I guess
it would be possible to add a way to overwrite them in mcelog.

Anyways you haven't described anything so far that the existing
machine check handler/mcelog cannot do (mcelog with some small tweaks)

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/