RE: [PATCH v2 1/2] x86/mce/AMD: Redo use of SMCA MCA_DE{STAT,ADDR} registers
From: Ghannam, Yazen
Date: Wed Apr 05 2017 - 13:06:42 EST
> -----Original Message-----
> From: Borislav Petkov [mailto:bp@xxxxxxxxx]
> Sent: Wednesday, April 05, 2017 12:45 PM
> To: Ghannam, Yazen <Yazen.Ghannam@xxxxxxx>
> Cc: linux-edac@xxxxxxxxxxxxxxx; Tony Luck <tony.luck@xxxxxxxxx>;
> x86@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v2 1/2] x86/mce/AMD: Redo use of SMCA
> MCA_DE{STAT,ADDR} registers
>
>
> > I'd rather we keep as many checks as possible out of __log_error().
>
> What checks?
>
Checking if we have a valid deferred error. Since we call __log_error() on
thresholding interrupts too we would need to tell it which handler is calling
it to do the correct check. This is what we currently do.
> > Your suggestion gave me an idea. Let's drop __log_error_deferred() and
> > just select the correct registers in the deferred error interrupt handler.
> >
> > /*
> > * APIC interrupt handler for deferred errors
> > *
> > * We have three scenarios for checking for Deferred errors.
> > * 1) Non-SMCA systems check MCA_STATUS and log error if found.
> > * 2) SMCA systems check MCA_STATUS. If error is found then log it and also
> > * clear MCA_DESTAT.
> > * 3) SMCA systems check MCA_DESTAT, if error was not found in
> MCA_STATUS, and
> > * log it.
> > */
> > static void amd_deferred_error_interrupt(void)
> > {
> > unsigned int bank;
> > u64 status;
> >
> > for (bank = 0; bank < mca_cfg.banks; ++bank) {
> > rdmsrl(msr_ops.status(bank), status);
> >
> > if (is_deferred_error(status)) {
> > __log_error(bank, msr_ops.status(bank),
> > msr_ops.addr(bank), 0);
>
> So we're an SMCA box and we land here on a deferred error, we don't have
> anything in the standard MSRs...
>
What do you mean " we don't have anything"? We check if we have a valid
deferred error in is_deferred_error(). Otherwise, we don't log anything.
> > /* Clear MCA_DESTAT even if we used MCA_STATUS. */
> > if (mce_flags.smca)
> >
> > wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0);
>
> ... and here we clear the info which we wanted to log before we log it!
>
No we don't. If we don't have a valid deferred error in MCA_STATUS then we
don't get here.
> >
> > } else if (mce_flags.smca) {
> > rdmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank),
> > status);
> >
> > if (is_deferred_error(status))
> > __log_error(bank,
> > MSR_AMD64_SMCA_MCx_DESTAT(bank),
> MSR_AMD64_SMCA_MCx_DEADDR(bank), 0);
>
> So we execute __log_error() twice on an SMCA box for a deferred error.
>
No we don't. This is an if/else-if statement.
Thanks,
Yazen