Re: Fwd: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
From: Borislav Petkov
Date: Thu Oct 09 2014 - 13:35:49 EST
On Thu, Oct 09, 2014 at 11:53:39AM -0500, Aravind Gopalakrishnan wrote:
> How do you mean "last error"?
> The interrupt is only fired upon overflow..
And? Think about it, what is causing the overflow? A CE, right?
There was even a call to machine_check_poll() there which we removed,
but for another reason. In any case, you should have the error signature
in the MCA banks of the last error causing the overflow, right? This is
what I mean with last error.
However(!),...
> CE error if collected through polling gives proper decoding info. So,
> why should this be any different for the same CE error for which an
> interrupt is generated on crossing a threshold?
... we're currently using a special signature to signal the overflow
with the K8_MCE_THRESHOLD_BASE thing. You simply report a special bank
and this way you can tell userspace that this is an overflow error. I
think that was the reason behind the software-defined banks.
Now, we can also drop that and simply log a normal error but make sure
MASK_OVERFLOW_HI is passed onto userspace so that it can see that the
error is an overflow error. I.e., something like this:
mce_setup(&m);
// rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus); - not sure about this one - we're not looking at MCGSTATUS for CEs
// rdmsrl(address, m.misc); - this MSR can be saved too as we're reading
// the MISC register already.
rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
m.bank = bank;
mce_log(&m);
so in the end it'll be something like this:
mce_setup(&m);
m.misc = (high << 32) | low;
rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
m.bank = bank;
mce_log(&m);
so I'm still on the fence about what we want to do and am expecting
arguments. I like the last one more because it is simpler and tools
don't need to know about the software-defined banks.
Thanks.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/