Re: [EDAC PATCH v13 6/7] edac.h: Prepare to handle with generic layers

From: Mauro Carvalho Chehab
Date: Wed Apr 25 2012 - 14:44:44 EST


Em 25-04-2012 15:32, Luck, Tony escreveu:
>> See the driver: the only useful information provided by the MCA log is
>> that an error happened, their physical address, and the type of the
>> error. Unlikely the Nehalem MCA, the MCE_MISC registers won't point to the
>> DIMM in the error.
>
> There's a bit more information in the MCA log than just the physical address:
>
> The cpu number that finds the data in its bank will provide socket information.
> [/proc/cpuinfo maps logical cpu numbers to "physical id"]

Yes, but this seems to be different than the CPU that actually has the memory
controller. The MCA registers have a bit to mark if the the error is at the
same CPU or on another one. So, when there's just 2 CPU (sockets), this could
be used, but, for more than 2 CPUs, this field is useless.

So, I opted to not trust on it.

> Low order bits of the MCi_STATUS register will give the channel. See the SDM.

On all tests I did, the channel information reported via MCi_status didn't
match the channel reported via the decoding logic. Maybe this might be due
to some bug on the pre-release CPUs I used so far.

> So the only missing information from the MCA log is which DIMM within
> the channel. I.e. we can pin the fault to a group of either two or
> three DIMMs depending on how many DIMMS/channel the motherboard supports.
>
> If you only have one DIMM per channel populated than socket/channel is
> sufficient to identify the DIMM.
>
> [We also don't have any intra-DIMM information for those customers who
> would like to diagnose the device on the DIMM, or which bits within
> the cache line had the error]
>
> -Tony
> --
> To unsubscribe from this list: send the line "unsubscribe linux-edac" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/