Re: [RFC PATCH] x86/mce: Make mce_rdmsrl() do a plain RDMSR only

From: Borislav Petkov
Date: Tue Sep 08 2020 - 05:47:03 EST


On Mon, Sep 07, 2020 at 01:06:22PM -0700, Luck, Tony wrote:
> Digging into the history it seems that this rdmsrl_safe() was added for
> a possible bug on a pentiumIII back in 2009 that was eventually closed
> as "unreproducible".

So this is the $ 10^6 question so far: if I can assume that those two
boxes we found in bug reports which triggered this warn because they
failed the MSR read even though those MSRs are MCA architectural, if I
can assume that those are a one-off thing and that we won't support such
and we'd let machines like that crash and burn in testing, hopefully
before they get released to the public, then, we can do the plain MSR
read there.

No, tracing, no exception handling.

HOWEVER!

*If* that non-existent MSR read happens after all, then the machine
would #GP.

Our #GP handler, however, doesn't panic unless panic_on_oops, as Andy
pointed out on IRC yesterday and this makes a nasty situation even
worse.

Now, even if it did panic, reporting this would not be very
user-friendly because people would have to look at the Code: output of
the splat to see which insn rIP pointed to and then figure out what
happened.

Now, we can look at the code bytes around rIP ourselves, in the #GP
handler, and decide to panic if 0F 32, i.e., RDMSR opcode.

But that is ewww ugly to say the least.

So, Andy suggested we do a simple .fixup so that when the RDMSR fails,
in the fixup we panic directly.

The problem there is:

NMI -> #MC -> #GP ... IRET

Another #MC won't happen after we IRET because MSR_IA32_MCG_STATUS is
not cleared yet but the state is mightily confused and you don't want to
run code on that box anymore but panic and not move.

> Do we have three distinct classes of calls to RDMSR now?
>
>
> 1) This case. Architecture checks have been made. This call can't
> fail. We are calling from a tricky state (on IST) so no tracing
> allowed.

*If* we can hold hw people to the architectural guarantee, then this
would be the cleanest fix.

> 2) Normal case ... architecture checks have been done so shouldn't
> fail. Safe state for tracing etc. Use rdmsrl().

Yeah, but see above.

> 3) Probe case. Architecture didn't provide definitive enumeration,
> so we might take a #GP. Use the rdmsrl_safe().

... and that is bad because above *and* tracing can't handle atomic
context properly.

> If mce subsystem is the only instance of case "1", then this
> inline is OK. If there are others then rather than inlining
> the asm here, we should have some new rdmsrl_notrace() inline
> function declared for all to use.

Yeah, Ingo suggested the same thing but let's solve the bigger issue
first.

> Needs a:
>
> Fixes: 11868a2dc4f5 ("x86: mce: Use safer ways to access MCE registers")
>
> since this is undoing an earlier change.

Ok.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette