[PATCH v2 00/16] MCA Updates

From: Yazen Ghannam
Date: Thu Apr 04 2024 - 11:17:56 EST


Hi all,

This set is a collection of logically independent updates that make
changes to common code. I've collected them to resolve conflicts and
ordering. Furthermore, this is the first half of a larger set. The
second half is focused on refactoring the AMD MCA Thresholding feature
support. So I decided to leave out the second half for now. The second
part will include AMD interrupt storm handling support on top of the
refactored code. Please see the link below for a work-in-progress branch
with the remaining changes.

Patches 1-2 deal with BERT MCA decode and preemption.

Patches 3-8 are general refactoring in preparation for later patches in
this set and the second planned set. The overall theme is to simplify
the AMD MCA init flow and to remove unnecessary data caching in per-CPU
variables. The init flow refactor will be completed in the second patch
set, since much of the cached data is used to set up MCA Thresholding.

Patches 9-10 unify the AMD THR and DFR interrupt handlers with MCA
polling.

Patch 11 is a small cleanup for the MCA Thresholding init path.

Patch 12 adds support for a new Corrected Error Interrupt on Scalable
MCA systems.

Patches 13-16 add support for new Scalable MCA registers and FRU Text
decoding feature.

Thanks,
Yazen

Branch for this set:
https://github.com/AMDESE/linux/tree/mca-updates-v2

Branch for remaining changes (work-in-progrss):
https://github.com/AMDESE/linux/tree/wip-mca

Link:
https://lkml.kernel.org/r/20231118193248.1296798-1-yazen.ghannam@xxxxxxx

Avadhut Naik (2):
x86/mce: Add wrapper for struct mce to export vendor specific info
x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers

Yazen Ghannam (14):
x86/mce: Define mce_setup() helpers for common and per-CPU fields
x86/mce: Use mce_setup() helpers for apei_smca_report_x86_error()
x86/mce/amd: Use fixed bank number for quirks
x86/mce/amd: Look up bank type by IPID
x86/mce/amd: Clean up SMCA configuration
x86/mce/amd: Prep DFR handler before enabling banks
x86/mce/amd: Simplify DFR handler setup
x86/mce/amd: Clean up enable_deferred_error_interrupt()
x86/mce: Unify AMD THR handler with MCA Polling
x86/mce: Unify AMD DFR handler with MCA Polling
x86/mce: Skip AMD threshold init if no threshold banks found
x86/mce/amd: Support SMCA Corrected Error Interrupt
x86/mce/apei: Handle variable register array size
EDAC/mce_amd: Add support for FRU Text in MCA

arch/x86/include/asm/mce.h | 24 +-
arch/x86/kernel/cpu/mce/amd.c | 461 ++++++++++++++----------
arch/x86/kernel/cpu/mce/apei.c | 124 +++++--
arch/x86/kernel/cpu/mce/core.c | 253 ++++++++-----
arch/x86/kernel/cpu/mce/dev-mcelog.c | 2 +-
arch/x86/kernel/cpu/mce/genpool.c | 20 +-
arch/x86/kernel/cpu/mce/inject.c | 4 +-
arch/x86/kernel/cpu/mce/internal.h | 13 +-
drivers/acpi/acpi_extlog.c | 2 +-
drivers/acpi/nfit/mce.c | 3 +-
drivers/edac/amd64_edac.c | 2 +-
drivers/edac/i7core_edac.c | 2 +-
drivers/edac/igen6_edac.c | 2 +-
drivers/edac/mce_amd.c | 29 +-
drivers/edac/pnd2_edac.c | 2 +-
drivers/edac/sb_edac.c | 2 +-
drivers/edac/skx_common.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 +-
drivers/ras/amd/fmpm.c | 2 +-
drivers/ras/cec.c | 3 +-
include/trace/events/mce.h | 51 +--
21 files changed, 620 insertions(+), 387 deletions(-)


base-commit: f382ab1037497f49d290ce6ceb9cdb10b186682e
--
2.34.1