Re: [RFC PATCH 00/21 v2] amd64_edac: EDAC module for AMD64

From: Andi Kleen
Date: Thu Apr 30 2009 - 08:43:22 EST


> ok, how about we remove tha MSR/PCI cfg space reading bits and leave
> that task solely to the mce core. Then, iff you have edac turned on in

That's the minimum fix, but even then the patchkit does a lot of
things, not necessarily all needs to be together.

> Kconfig, mce code delivers needed error info to edac which, in turn,
> goes and decodes the error/does the mapping to DIMM blocks/supplies DRAM
> error injection facility for testing purposes and similar things. That
> way you have both and they don't overlap in functionality.

You can do that, but it's redundant because mcelog can do this
this already. I had some conversations with existing EDAC users
recently and they seem to only care about the resulting output,
so just querying from mcelog is fine.
The only issue is that mcelog needs to get the DIMM data. In many
cases it can do so from SMBIOS output, if not a suitable interface
would need to be provided by the kernel.

> By the way, I think there's a similar attempt/proposal of letting mce
> and edac talk to each other from Red Hat so I think this could be a

There was a fairly dubious patch floating around I think, but it
had a couple of problems.

> > -Andi (who thinks all of this decoding should be in user space anyways)
>
> Think of a big data center with a thousands of 2,4,8 socket blades
> and the admin collecting mce output and running around decoding the

Nobody said anything about admins decoding on their workstation.

Corrected events (which are the 90+% case) get decoded in user space on the
same system. Uncorrected events get decoded after the reboot. Both happens
automatically and transparently.

-Andi

--
ak@xxxxxxxxxxxxxxx -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/