Re: [PATCH v7] x86/mce: retrieve poison range from hardware

From: Jane Chu
Date: Fri Aug 26 2022 - 18:12:40 EST


On 8/26/2022 11:09 AM, Borislav Petkov wrote:
> On Fri, Aug 26, 2022 at 10:54:31AM -0700, Dan Williams wrote:
>> How about:
>>
>> ---
>>
>> When memory poison consumption machine checks fire,
>> mce-notifier-handlers like nfit_handle_mce() record the impacted
>> physical address range.
>
> ... which is reported by the hardware in the MCi_MISC MSR.
>
>> The error information includes data about blast
>> radius, i.e. how many cachelines did the hardware determine are
>> impacted.
>
> Yap, nice.
>
>> A recent change, commit 7917f9cdb503 ("acpi/nfit: rely on
>> mce->misc to determine poison granularity"), updated nfit_handle_mce()
>> to stop hard coding the blast radius value of 1 cacheline, and instead
>> rely on the blast radius reported in 'struct mce' which can be up to 4K
>> (64 cachelines).
>>
>> It turns out that apei_mce_report_mem_error() had a similar problem in
>> that it hard coded a blast radius of 4K rather than checking the blast
>
> s/checking/reading/
>
>> radius in the error information. Fix apei_mce_report_mem_error() to
>
> s/in/from/
>
>> convey the proper poison granularity.
>>
>> ---
>
> Yap, that's a lot better.
>
> Thanks!


Got it and points taken. Thank you both, Boris and Dan.

v8 coming up.

thanks,
-jane