Re: [PATCH 1/1] EDAC/{skx_common,i10nm}: Fix some missing error reports on Emerald Rapids
From: Luck, Tony
Date: Thu Feb 20 2025 - 20:16:09 EST
On Fri, Feb 14, 2025 at 08:27:28AM +0800, Qiuxu Zhuo wrote:
> When doing error injection to some memory DIMMs on certain Intel Emerald
> Rapids servers, the i10nm_edac missed error reports for some memory DIMMs.
>
> Certain BIOS configurations may hide some memory controllers, and the
> i10nm_edac doesn't enumerate these hidden memory controllers. However, the
> ADXL decodes memory errors using memory controller physical indices even
> if there are hidden memory controllers. Therefore, the memory controller
> physical indices reported by the ADXL may mismatch the logical indices
> enumerated by the i10nm_edac, resulting in missed error reports for some
> memory DIMMs.
>
> Fix this issue by creating a mapping table from memory controller physical
> indices (used by the ADXL) to logical indices (used by the i10nm_edac) and
> using it to convert the physical indices to the logical indices during the
> error handling process.
>
> Fixes: c545f5e41225 ("EDAC/i10nm: Skip the absent memory controllers")
> Reported-by: Kevin Chang <kevin1.chang@xxxxxxxxx>
> Tested-by: Kevin Chang <kevin1.chang@xxxxxxxxx>
> Reported-by: Thomas Chen <Thomas.Chen@xxxxxxxxx>
> Tested-by: Thomas Chen <Thomas.Chen@xxxxxxxxx>
> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@xxxxxxxxx>
Applied to RAS tree edac-drivers branch
Thanks
-Tony