Re: + edac-new-opteron-athlon64-memory-controller-driver.patch added to -mm tree
From: Eric W. Biederman
Date: Thu Jul 06 2006 - 11:31:19 EST
Andi Kleen <ak@xxxxxx> writes:
> On Thu, Jul 06, 2006 at 12:12:14AM -0600, Eric W. Biederman wrote:
>>
>> knows the DIMM by requires the reading of hardware registers,
>> some that are not easily accessible to user space so a kernel driver
>> tends to make sense, just to get the information.
>>
>> Possibly we could just export that information and let the
>> user space figure it out from there. But memory is a key system
>
> You can do it completely in user space. See mcelog as proof.
>
> And figuring out the channel in a lot of code etc. seems overkill to me - or
> at least i haven't gotten an explanation why it's better than just
> using the reported address.
So breaking this down simply.
With EDAC on my next boot I get positive confirmation that I either
pulled the DIMM that the error happened on, or I pulled a different
DIMM.
Mapping the hardware addresses to the motherboard silk screen label
before hand is unnecessary and just ensures that you pull out the DIMM
you are trying for the first time. Making it an optimization for
people who do that a lot.
To the best of my knowledge mcelog even with the --dmi option cannot
give me that.
Knowing that we actually pulled out the DIMM that the errors were
reported against is what we get by going beyond the address
in the machine check.
Does that cross the communication divide?
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/