RE: [PATCH v2 12/16] x86/mce: Unify AMD THR handler with MCA Polling
From: Zhuo, Qiuxu
Date: Tue Feb 18 2025 - 01:43:22 EST
> From: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> Sent: Friday, February 14, 2025 12:46 AM
> To: x86@xxxxxxxxxx; Luck, Tony <tony.luck@xxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-edac@xxxxxxxxxxxxxxx;
> Smita.KoralahalliChannabasappa@xxxxxxx; Yazen Ghannam
> <yazen.ghannam@xxxxxxx>
> Subject: [PATCH v2 12/16] x86/mce: Unify AMD THR handler with MCA Polling
>
> AMD systems optionally support an MCA thresholding interrupt. The interrupt
> should be used as another signal to trigger MCA polling. This is similar to how
> the Intel Corrected Machine Check interrupt (CMCI) is handled.
>
> AMD MCA thresholding is managed using the MCA_MISC registers within an
> MCA bank. The OS will need to modify the hardware error count field in order
> to reset the threshold limit and rearm the interrupt. Management of the
> MCA_MISC register should be done as a follow up to the basic MCA polling
s/follow up/follow-up
> flow. It should not be the main focus of the interrupt handler.
>
> Furthermore, future systems will have the ability to send an MCA
> thresholding interrupt to the OS even when the OS does not manage the
> feature, i.e. MCA_MISC registers are Read-as-Zero/Locked.
>
> Call the common MCA polling function when handling the MCA thresholding
> interrupt. This will allow the OS to find any valid errors whether or not the
> MCA thresholding feature is OS-managed. Also, this allows the common MCA
> polling options and kernel parameters to apply to AMD systems.
>
> Add a callback to the MCA polling function to check and reset any threshold
> blocks that have reached their threshold limit.
>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@xxxxxxxxx>