Re: [PATCH v2 1/2] x86/mce/AMD: Redo use of SMCA MCA_DE{STAT,ADDR} registers

From: Borislav Petkov
Date: Wed Apr 05 2017 - 09:40:17 EST


On Tue, Apr 04, 2017 at 12:24:31PM -0500, Yazen Ghannam wrote:
> From: Yazen Ghannam <yazen.ghannam@xxxxxxx>
>
> We have support for the new SMCA MCA_DE{STAT,ADDR} registers in Linux. So
> we've used these registers in place of MCA_{STATUS,ADDR} on SMCA systems.
> However, the guidance for current implementations of SMCA is to continue
> using MCA_{STATUS,ADDR} and to use MCA_DE{STAT,ADDR} only if a Deferred
> error was not found in the former registers. This also means we shouldn't
> clear MCA_CONFIG[LogDeferredInMcaStat].

Does that mean we should trust what BIOS has set it to?

>
> Redo the AMD Deferred error interrupt handler to follow the guidance for
> current SMCA systems. Also, don't break after finding the first error.
>
> Rework __log_error() for clarity.
>
> Don't clear MCA_CONFIG[LogDeferredInMcaStat] during AMD mcheck init.
>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> ---

...

> @@ -832,25 +821,29 @@ asmlinkage __visible void __irq_entry smp_trace_deferred_error_interrupt(void)
> exiting_ack_irq();
> }
>
> +static inline bool is_deferred_error(u64 status)
> +{
> + return ((status & MCI_STATUS_VAL) && (status & MCI_STATUS_DEFERRED));
> +}
> +
> /* APIC interrupt handler for deferred errors */
> static void amd_deferred_error_interrupt(void)
> {
> unsigned int bank;
> - u32 msr_status;
> u64 status;
>
> for (bank = 0; bank < mca_cfg.banks; ++bank) {
> - msr_status = (mce_flags.smca) ? MSR_AMD64_SMCA_MCx_DESTAT(bank)
> - : msr_ops.status(bank);
> + rdmsrl(msr_ops.status(bank), status);
>
> - rdmsrl(msr_status, status);
> + if (is_deferred_error(status)) {
> + __log_error_deferred(bank, false);
>
> - if (!(status & MCI_STATUS_VAL) ||
> - !(status & MCI_STATUS_DEFERRED))
> - continue;
> + } else if (mce_flags.smca) {
> + rdmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), status);
>
> - __log_error(bank, true, false, 0);
> - break;
> + if (is_deferred_error(status))
> + __log_error_deferred(bank, true);
> + }

The diff is very hard to parse, lemme paste the whole resulting function
after the patch:

> /* APIC interrupt handler for deferred errors */
> static void amd_deferred_error_interrupt(void)
> {
> unsigned int bank;
> u64 status;
>
> for (bank = 0; bank < mca_cfg.banks; ++bank) {
> rdmsrl(msr_ops.status(bank), status);
>
> if (is_deferred_error(status)) {
> __log_error_deferred(bank, false);
>
> } else if (mce_flags.smca) {
> rdmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), status);
>
> if (is_deferred_error(status))
> __log_error_deferred(bank, true);

So this doesn't sound like the guidance: if we should fallback to the
DE* versions only if a deferred error wasn't found in the usual regs,
the logic should be:

static void __log_error_deferred(unsigned int bank)
{
u32 msr_stat = msr_ops.status(bank);
u32 msr_addr = msr_ops.addr(bank);

/*
* __log_error() needs to return true to say it has logged successfully or false
* if it hasn't.
*/
if (__log_error(bank, msr_stat, msr_addr, 0))
goto out;

if (!mce_flags.smca)
return;

/*
* Couldn't log a deferred error from the usual regs, fallback to DE*
* variants.
*/
msr_stat = MSR_AMD64_SMCA_MCx_DESTAT(bank);
msr_addr = MSR_AMD64_SMCA_MCx_DEADDR(bank);

__log_error(bank, msr_stat, msr_addr, 0);

out:
/* We should still clear MCA_DESTAT even if we used MCA_STATUS. */
if (mce_flags.smca)
wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0);
}

and this way you don't need to wiggle around that use_smca_destat thing.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.