Re: [PATCH v2 4/4] x86/mce: Add AMD SMCA support to SRAO notifier

From: Borislav Petkov
Date: Wed Mar 22 2017 - 17:13:57 EST


On Mon, Mar 20, 2017 at 03:26:54PM -0500, Yazen Ghannam wrote:
> From: Yazen Ghannam <yazen.ghannam@xxxxxxx>
>
> Deferred errors on AMD systems may get an Action Optional severity with the
> goal of being handled by the SRAO notifier block. However, the process of
> determining if an address is usable is different between Intel and AMD. So
> define vendor-specific functions for this.
>
> Also, check for the AO severity before determining if an address is usable
> to possibly save some cycles.
>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> ---
> Link: http://lkml.kernel.org/r/1486760120-60944-3-git-send-email-Yazen.Ghannam@xxxxxxx
>
> v1->v2:
> - New in v2. Based on v1 patch 3.
> - Update SRAO notifier block to handle errors from SMCA systems.
>
> arch/x86/kernel/cpu/mcheck/mce.c | 52 ++++++++++++++++++++++++++++++----------
> 1 file changed, 40 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 5e365a2..1a2669d 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -547,19 +547,49 @@ static void mce_report_event(struct pt_regs *regs)
> * be somewhat complicated (e.g. segment offset would require an instruction
> * parser). So only support physical addresses up to page granuality for now.
> */
> -static int mce_usable_address(struct mce *m)
> +static int mce_usable_address_intel(struct mce *m, unsigned long *pfn)

So this function is basically saying whether the address is usable but
then it is also returning it if so.

And then it is using an I/O argument. Yuck.

So it sounds to me like this functionality needs redesign: something
like have a get_usable_address() function (the "mce_" prefix is not
really needed as it is static) which returns an invalid value when it
determines that it doesn't have a usable address and the address itself
if it succeeds.

> {
> - if (!(m->status & MCI_STATUS_MISCV) || !(m->status & MCI_STATUS_ADDRV))
> + if (!(m->status & MCI_STATUS_MISCV))
> return 0;
> -
> - /* Checks after this one are Intel-specific: */
> - if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> - return 1;
> -
> if (MCI_MISC_ADDR_LSB(m->misc) > PAGE_SHIFT)
> return 0;
> if (MCI_MISC_ADDR_MODE(m->misc) != MCI_MISC_ADDR_PHYS)
> return 0;
> +
> + *pfn = m->addr >> PAGE_SHIFT;
> + return 1;
> +}
> +
> +/* Only support this on SMCA systems and errors logged from a UMC. */
> +static int mce_usable_address_amd(struct mce *m, unsigned long *pfn)
> +{
> + u8 umc;
> + u16 nid = cpu_to_node(m->extcpu);
> + u64 addr;
> +
> + if (!mce_flags.smca)
> + return 0;

So on !SMCA systems there'll be no usable address ever! Even with
MCI_STATUS_ADDRV set.

Please *test* your stuff on all affected hardware before submitting.

> +
> + umc = find_umc_channel(m);
> +
> + if (umc < 0 || umc_normaddr_to_sysaddr(m->addr, nid, umc, &addr))
> + return 0;
> +
> + *pfn = addr >> PAGE_SHIFT;
> + return 1;
> +}
> +
> +static int mce_usable_address(struct mce *m, unsigned long *pfn)
> +{
> + if (!(m->status & MCI_STATUS_ADDRV))
> + return 0;

What happened to the MCI_STATUS_MISCV bit check?

> + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
> + return mce_usable_address_intel(m, pfn);
> +
> + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
> + return mce_usable_address_amd(m, pfn);
> +
> return 1;

We definitely don't want to say that the address is usable on a third
vendor. It would be most likely a lie even if we never reach this code
on a third vendor.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--