Re: [PATCH] x86/mm: Don't try to change poison pages to uncacheable in a guest
From: Luck, Tony
Date: Mon May 18 2020 - 11:36:29 EST
On Mon, May 18, 2020 at 03:48:13PM +0200, Borislav Petkov wrote:
> Hi,
>
> lemme try to reply to three emails at once.
>
> First of all, the two of you: pls do not top-post.
Sorry. Phone e-mail client is dumb.
> On Sat, May 16, 2020 at 6:52 PM Luck, Tony <tony.luck@xxxxxxxxx> wrote:
> > But the guest isnât likely to do the right thing with a page fault.
> > The guest just accessed a page that it knows is poisoned (VMM just told
> > it once that it was poisoned). There is no reason that the VMM should
> > let the guest actually touch the poison a second time. But if the guest
> > does, then the guest should get the expected response. I.e. another
> > machine check.
>
> So Jue says below that the VMM has unmapped the guest page from the EPT.
> So the guest cannot touch the poison anymore.
>
> How is then possible for the guest to touch it again if the page is not
> mapped anymore?
The VMM wants to make sure that the guest can't touch the poison
(this is important because not every touch of poison results in a
recoverable machine check. If the guest's next touch is an unaligned
access that crosses from the poison cache line to a non-poisoned line
then h/w will signal a fatal machinecheck and the whole machine will
go down).
> The guest should get a #PF when the page is unmapped and cause a new
> page to be mapped.
The VMM gets the page fault (because the unmapping of the guest
physical address is at the VMM EPT level). The VMM can't map a new
page into that guest physical address because it has no way to
replace the contents of the old page. The VMM could pass the #PF
to the guest, but that would just confuse the guest (its page tables
all say that the page is still valid). In this particular case the
page is part of the 1:1 kernel map. So the kernel will OOPS (I think).
> On Sun, May 17, 2020 at 07:36:00AM -0700, Jue Wang wrote:
> > The stack is from guest MCE handler's processing of the first MCE injected.
>
> Aha, so you've flipped the functions order in the trace. It all starts
> at
>
> set_mce_nospec(m->addr >> PAGE_SHIFT);
>
> Now it makes sense.
>
> > Note before the first MCE is injected into guest (i.e., after the host MCE
> > handler successfully finished MCE handling and notified VMM via SIGBUS with
> > BUS_MCEERR_AR), VMM unmaps the guest page from EPT.
>
> Ok, good.
>
> > The guest MCE handler finished the "normal" MCE handling and recovery
> > (memory_failure() in mm/memory_failure.cc) successfully, it's the aftermath
> > below leading to the stack trace:
> > https://github.com/torvalds/linux/blob/5a9ffb954a3933d7867f4341684a23e008d6839b/arch/x86/kernel/cpu/mce/core.c#L1101
>
> On Sun, May 17, 2020 at 08:33:00AM -0700, Jue Wang wrote:
> > In other words, it's the *do_memory_failure -> set_mce_nospec* flow of
> > guest MCE handler acting on the *first* MCE injection. As a result, the
> > guest panics and resets *whenever* there is an MCE injected, even when the
> > injected MCE is recoverable.
>
> So IIUC that set_mce_nospec() thing should check whether m->addr is
> mapped first and only then mark it _uc and whatever monkey business it
> does. Not this blanket
PLease explain how a guest (that doesn't even know that it is a guest)
is going to figure out that the EPT tables (that it has no way to access)
have marked this page invalid in guest physical address space.
> if am I a guest?
>
> test.
>
> Imagine a hypervisor which doesn't set X86_FEATURE_HYPERVISOR, i.e.,
> CPUID(1)[EDX, bit 31]?
Guest is going to be screwed in this case.
-Tony