RE: [PATCH v7 1/5] x86/mce: Add wrapper for struct mce to export vendor specific info
From: Zhuo, Qiuxu
Date: Wed Oct 23 2024 - 22:21:52 EST
> From: Avadhut Naik <avadhut.naik@xxxxxxx>
> [...]
> Subject: [PATCH v7 1/5] x86/mce: Add wrapper for struct mce to export
> vendor specific info
>
> Currently, exporting new additional machine check error information involves
> adding new fields for the same at the end of the struct mce.
> This additional information can then be consumed through mcelog or
> tracepoint.
>
> However, as new MSRs are being added (and will be added in the future) by
> CPU vendors on their newer CPUs with additional machine check error
> information to be exported, the size of struct mce will balloon on some CPUs,
> unnecessarily, since those fields are vendor-specific. Moreover, different CPU
> vendors may export the additional information in varying sizes.
>
> The problem particularly intensifies since struct mce is exposed to userspace
> as part of UAPI. It's bloating through vendor-specific data should be avoided
> to limit the information being sent out to userspace.
>
> Add a new structure mce_hw_err to wrap the existing struct mce. The same
> will prevent its ballooning since vendor-specifc data, if any, can now be
> exported through a union within the wrapper structure and through
> __dynamic_array in mce_record tracepoint.
>
> Furthermore, new internal kernel fields can be added to the wrapper struct
> without impacting the user space API.
>
> [Yazen: Add last commit message paragraph.]
>
> Suggested-by: Borislav Petkov (AMD) <bp@xxxxxxxxx>
> Signed-off-by: Avadhut Naik <avadhut.naik@xxxxxxx>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> Signed-off-by: Avadhut Naik <avadhut.naik@xxxxxxx>
> ---
> Changes in v2:
> [1] https://lore.kernel.org/linux-edac/20240521125434.1555845-1-
> yazen.ghannam@xxxxxxx/
> [2] https://lore.kernel.org/linux-edac/20240523155641.2805411-1-
> yazen.ghannam@xxxxxxx/
>
> 1. Drop dependencies on sets [1] and [2] above and rebase on top of
> tip/master.
>
> Changes in v3:
> 1. Move wrapper changes required in mce_read_aux() and
> mce_no_way_out() to this patch from the second patch.
> 2. Fix SoB chain to properly reflect the patch path.
>
> Changes in v4:
> 1. Rebase on of tip/master to avoid merge conflicts.
> 2. Resolve kernel test robot's warning on the use of memset() in
> do_machine_check().
>
> Changes in v5:
> 1. No changes except rebasing on top of tip/master.
>
> Changes in v6:
> 1. Rebase on top of tip/master.
> 2. Introduce to_mce_hw_err macro to eliminate changes required in notifier
> chain callback functions, especially callback functions of EDAC drivers.
> 3. Change third parameter of __mc_scan_banks() to a pointer to the new
> wrapper structure and make the required changes accordingly.
>
> Changes in v7:
> 1. Rebase on top of tip/master.
> 2. Fix initialization of struct mce_hw_err *final in do_machine_check().
As my comments resolved in v6 and v7,
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@xxxxxxxxx>
-Qiuxu