Re: [PATCH] x86/mm: Don't try to change poison pages to uncacheable in a guest
From: Borislav Petkov
Date: Tue May 19 2020 - 04:50:09 EST
On Mon, May 18, 2020 at 11:26:29AM -0700, Luck, Tony wrote:
> That question only makes any sense if you know you are running as a
> guest and that someone else has unmapped the address. It's a meaningless
> question to ask if you are running bare metal. So we'd still have a check
> for FEATURE_HYPERVISOR
Maybe I'm not making myself clear enough here: I'm talking about using
a *special* MCE signature which says "your mapping has disappeared from
underneath you." Maybe a bit in MCi_MISC which the hw doesn't use. Or
some other deprecated bit, whatever.
If that signature is unique you won't have to check for hypervisor - you
*know* it comes from it.
Because the hypervisor would be telling the guest: "I have removed the
page from under you, you should act accordingly" with that signature. Vs
the kernel going with the unspecific "am I running as a guest"?
See the huge difference?
> Maybe it isn't pretty. But I don't see another practical solution.
See above. Below too. I actually got two.
> The VMM is doing exactly the right thing here. It should not trust
> that the guest will behave and not touch the poison location again.
> If/when the guest does touch the poison, the right action is
> for the VMM to fake a new machine check to the guest.
Yes, and that new machine check should tell the guest: "do not CLFLUSH
this address - I've unmapped it and you don't have to do anything."
Basically what your hypervisor check does but *actually* stating why it
raised the second MCE.
> Theoretlcally the VMM could decode the instruction that the guest
> was trying to use on the poison page and decide "oh, this is that
> weird case in Linux where it's just trying to CLFLUSH the page. I'll
> just step the return IP past the CLFLUSH and let the guest continue".
... or not inject the second MCE at all.
That would be fixing it in the HV.
Because there's this other way to look at it and come to think of it,
fixing this in the HV makes a lot more sense. Why?
Well, let me elaborate:
The hypervisor just removed that page under the guest's feet and if that
hypervisor wants to support unenlightened guests which cannot even deal
with pages disappearing from under their feet, then that hypervisor
better not inject that second MCE.
Why would it even inject the MCE - what can the guest even do about
it? Exactly *nothing*. The page is unmapped and gone, the guest cannot
salvage any information anymore from it.
And yes, the hypervisor has *all* the information, it knows which page
it just removed so if the guest tries to access memory which HV just
poisoned and is within the range which was covered by that page, then it
should *not* inject that MCE. The guest can't handle it and why would it
inject it - it is an access to a poisoned page which the HV *knows* it
won't succeed so why bother?
The HV simply returns without injecting the MCE and so on, until the
4K page's end. It simply ignores guest accesses to the poisoned page.
Without any guest changes.
> N.B. Linux wants to switch the page to uncacheable so that in the
> persistant memory case the filesytem code can continue to access
> the other "blocks" in the page, rather than lose all of them. That's
> futile in the case where the VMM took the whole 4K away. Maybe Dan
> needs to think about the guest case too.
Yes, if the 4K page just went away, marking it UC doesn't make any
sense.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette