Re: [PATCH v2 1/2] x86/mce/AMD: Redo use of SMCA MCA_DE{STAT,ADDR} registers

From: Borislav Petkov
Date: Wed Apr 05 2017 - 12:46:52 EST


On Wed, Apr 05, 2017 at 02:52:08PM +0000, Ghannam, Yazen wrote:
> Yes, BIOS does and should set this bit. If it doesn't for some reason
> then our handling in the Kernel will still be functional. We just
> won't find Deferred errors in MCA_STATUS.

Ok.

> I'd rather we keep as many checks as possible out of __log_error().

What checks?

> Your suggestion gave me an idea. Let's drop __log_error_deferred() and just
> select the correct registers in the deferred error interrupt handler.
>
> /*
> * APIC interrupt handler for deferred errors
> *
> * We have three scenarios for checking for Deferred errors.
> * 1) Non-SMCA systems check MCA_STATUS and log error if found.
> * 2) SMCA systems check MCA_STATUS. If error is found then log it and also
> * clear MCA_DESTAT.
> * 3) SMCA systems check MCA_DESTAT, if error was not found in MCA_STATUS, and
> * log it.
> */
> static void amd_deferred_error_interrupt(void)
> {
> unsigned int bank;
> u64 status;
>
> for (bank = 0; bank < mca_cfg.banks; ++bank) {
> rdmsrl(msr_ops.status(bank), status);
>
> if (is_deferred_error(status)) {
> __log_error(bank, msr_ops.status(bank), msr_ops.addr(bank), 0);

So we're an SMCA box and we land here on a deferred error, we don't have
anything in the standard MSRs...

> /* Clear MCA_DESTAT even if we used MCA_STATUS. */
> if (mce_flags.smca)
> wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0);

... and here we clear the info which we wanted to log before we log it!

>
> } else if (mce_flags.smca) {
> rdmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), status);
>
> if (is_deferred_error(status))
> __log_error(bank, MSR_AMD64_SMCA_MCx_DESTAT(bank), MSR_AMD64_SMCA_MCx_DEADDR(bank), 0);

So we execute __log_error() twice on an SMCA box for a deferred error.

So no, I'd like to see clear code flow which does what the guidance is.
And if you feel it is not readable enough then you restructure more. And
I gave you a perfectly fine __log_error_deferred() which actually does
what the guidance is.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.