RE: [PATCH v2 16/16] x86/mce: Handle AMD threshold interrupt storms
From: Zhuo, Qiuxu
Date: Tue Feb 18 2025 - 08:52:20 EST
> From: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> Sent: Friday, February 14, 2025 12:46 AM
> To: x86@xxxxxxxxxx; Luck, Tony <tony.luck@xxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-edac@xxxxxxxxxxxxxxx;
> Smita.KoralahalliChannabasappa@xxxxxxx; Yazen Ghannam
> <yazen.ghannam@xxxxxxx>
> Subject: [PATCH v2 16/16] x86/mce: Handle AMD threshold interrupt storms
>
> From: Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx>
>
> Extend the logic of handling CMCI storms to AMD threshold interrupts.
>
> Rely on the similar approach as of Intel's CMCI to mitigate storms per CPU
> and per bank. But, unlike CMCI, do not set thresholds and reduce interrupt
> rate on a storm. Rather, disable the interrupt on the corresponding CPU and
> bank. Re-enable back the interrupts if enough consecutive polls of the bank
> show no corrected errors (30, as programmed by Intel).
>
> Turning off the threshold interrupts would be a better solution on AMD
> systems as other error severities will still be handled even if the threshold
> interrupts are disabled.
>
> [Tony: Small tweak because mce_handle_storm() isn't a pointer now]
> [Yazen: Rebase and simplify]
>
> Signed-off-by: Smita Koralahalli
> <Smita.KoralahalliChannabasappa@xxxxxxx>
> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
LGTM.
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@xxxxxxxxx>