Re: Fwd: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it

From: Aravind Gopalakrishnan
Date: Thu Oct 09 2014 - 15:01:25 EST


On 10/9/2014 12:35 PM, Borislav Petkov wrote:
On Thu, Oct 09, 2014 at 11:53:39AM -0500, Aravind Gopalakrishnan wrote:
How do you mean "last error"?
The interrupt is only fired upon overflow..
And? Think about it, what is causing the overflow? A CE, right?

There was even a call to machine_check_poll() there which we removed,
but for another reason. In any case, you should have the error signature
in the MCA banks of the last error causing the overflow, right?

Right. I was not arguing that we shouldn't. Just wasn't clear on what you meant.
Anyway, Thanks for clarifying.

This is
what I mean with last error.

However(!),...

CE error if collected through polling gives proper decoding info. So,
why should this be any different for the same CE error for which an
interrupt is generated on crossing a threshold?
... we're currently using a special signature to signal the overflow
with the K8_MCE_THRESHOLD_BASE thing. You simply report a special bank
and this way you can tell userspace that this is an overflow error. I
think that was the reason behind the software-defined banks.

Now, we can also drop that and simply log a normal error but make sure
MASK_OVERFLOW_HI is passed onto userspace so that it can see that the
error is an overflow error. I.e., something like this:

mce_setup(&m);
// rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus); - not sure about this one - we're not looking at MCGSTATUS for CEs
That's right. Might as well remove it.

// rdmsrl(address, m.misc); - this MSR can be saved too as we're reading
// the MISC register already.
rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
m.bank = bank;
mce_log(&m);

so in the end it'll be something like this:

mce_setup(&m);
m.misc = (high << 32) | low;
rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
m.bank = bank;
mce_log(&m);

so I'm still on the fence about what we want to do and am expecting
arguments.

I actually agree with this approach. So no argument:)
I like the last one more because it is simpler and tools
don't need to know about the software-defined banks.


Thanks
-Aravind.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/