Re: [PATCH v2 12/16] x86/mce: Unify AMD THR handler with MCA Polling
From: Yazen Ghannam
Date: Wed Feb 19 2025 - 11:08:53 EST
On Tue, Feb 18, 2025 at 06:42:52AM +0000, Zhuo, Qiuxu wrote:
> > From: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> > Sent: Friday, February 14, 2025 12:46 AM
> > To: x86@xxxxxxxxxx; Luck, Tony <tony.luck@xxxxxxxxx>
> > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-edac@xxxxxxxxxxxxxxx;
> > Smita.KoralahalliChannabasappa@xxxxxxx; Yazen Ghannam
> > <yazen.ghannam@xxxxxxx>
> > Subject: [PATCH v2 12/16] x86/mce: Unify AMD THR handler with MCA Polling
> >
> > AMD systems optionally support an MCA thresholding interrupt. The interrupt
> > should be used as another signal to trigger MCA polling. This is similar to how
> > the Intel Corrected Machine Check interrupt (CMCI) is handled.
> >
> > AMD MCA thresholding is managed using the MCA_MISC registers within an
> > MCA bank. The OS will need to modify the hardware error count field in order
> > to reset the threshold limit and rearm the interrupt. Management of the
> > MCA_MISC register should be done as a follow up to the basic MCA polling
>
> s/follow up/follow-up
>
Ack.
> > flow. It should not be the main focus of the interrupt handler.
> >
> > Furthermore, future systems will have the ability to send an MCA
> > thresholding interrupt to the OS even when the OS does not manage the
> > feature, i.e. MCA_MISC registers are Read-as-Zero/Locked.
> >
> > Call the common MCA polling function when handling the MCA thresholding
> > interrupt. This will allow the OS to find any valid errors whether or not the
> > MCA thresholding feature is OS-managed. Also, this allows the common MCA
> > polling options and kernel parameters to apply to AMD systems.
> >
> > Add a callback to the MCA polling function to check and reset any threshold
> > blocks that have reached their threshold limit.
> >
> > Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
>
> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@xxxxxxxxx>
>
Thanks,
Yazen