Re: [PATCH v2 1/4] x86/mce: Add wrapper for struct mce to export vendor specific info

From: Borislav Petkov
Date: Wed Jun 26 2024 - 06:45:39 EST


On Tue, Jun 25, 2024 at 02:56:21PM -0500, Avadhut Naik wrote:
> Currently, exporting new additional machine check error information
> involves adding new fields for the same at the end of the struct mce.
> This additional information can then be consumed through mcelog or
> tracepoint.
>
> However, as new MSRs are being added (and will be added in the future)
> by CPU vendors on their newer CPUs with additional machine check error
> information to be exported, the size of struct mce will balloon on some
> CPUs, unnecessarily, since those fields are vendor-specific. Moreover,
> different CPU vendors may export the additional information in varying
> sizes.
>
> The problem particularly intensifies since struct mce is exposed to
> userspace as part of UAPI. It's bloating through vendor-specific data
> should be avoided to limit the information being sent out to userspace.
>
> Add a new structure mce_hw_err to wrap the existing struct mce. The same
> will prevent its ballooning since vendor-specifc data, if any, can now be
> exported through a union within the wrapper structure and through
> __dynamic_array in mce_record tracepoint.
>
> Furthermore, new internal kernel fields can be added to the wrapper
> struct without impacting the user space API.
>
> Note: Some Checkpatch checks have been ignored to maintain coding style.
>
> [Yazen: Add last commit message paragraph.]
>
> Suggested-by: Borislav Petkov (AMD) <bp@xxxxxxxxx>
> Signed-off-by: Avadhut Naik <avadhut.naik@xxxxxxx>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> ---
> arch/x86/include/asm/mce.h | 6 +-
> arch/x86/kernel/cpu/mce/amd.c | 29 ++--
> arch/x86/kernel/cpu/mce/apei.c | 54 +++----
> arch/x86/kernel/cpu/mce/core.c | 178 +++++++++++++-----------
> arch/x86/kernel/cpu/mce/dev-mcelog.c | 2 +-
> arch/x86/kernel/cpu/mce/genpool.c | 20 +--
> arch/x86/kernel/cpu/mce/inject.c | 4 +-
> arch/x86/kernel/cpu/mce/internal.h | 4 +-
> drivers/acpi/acpi_extlog.c | 2 +-
> drivers/acpi/nfit/mce.c | 2 +-
> drivers/edac/i7core_edac.c | 2 +-
> drivers/edac/igen6_edac.c | 2 +-
> drivers/edac/mce_amd.c | 2 +-
> drivers/edac/pnd2_edac.c | 2 +-
> drivers/edac/sb_edac.c | 2 +-
> drivers/edac/skx_common.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
> drivers/ras/amd/fmpm.c | 2 +-
> drivers/ras/cec.c | 2 +-
> include/trace/events/mce.h | 42 +++---
> 20 files changed, 199 insertions(+), 162 deletions(-)

Ok, did some minor massaging but otherwise looks ok now.

Tony, any comments? You ok with this, would that fit any Intel-specific vendor
fields too or do you need some additional Intel-specific changes?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette