Re: [PATCH] x86/mm: Don't try to change poison pages to uncacheable in a guest

From: Borislav Petkov
Date: Mon May 18 2020 - 09:48:24 EST


Hi,

lemme try to reply to three emails at once.

First of all, the two of you: pls do not top-post.

On Sat, May 16, 2020 at 6:52 PM Luck, Tony <tony.luck@xxxxxxxxx> wrote:
> But the guest isnât likely to do the right thing with a page fault.
> The guest just accessed a page that it knows is poisoned (VMM just told
> it once that it was poisoned). There is no reason that the VMM should
> let the guest actually touch the poison a second time. But if the guest
> does, then the guest should get the expected response. I.e. another
> machine check.

So Jue says below that the VMM has unmapped the guest page from the EPT.
So the guest cannot touch the poison anymore.

How is then possible for the guest to touch it again if the page is not
mapped anymore?

The guest should get a #PF when the page is unmapped and cause a new
page to be mapped.

On Sun, May 17, 2020 at 07:36:00AM -0700, Jue Wang wrote:
> The stack is from guest MCE handler's processing of the first MCE injected.

Aha, so you've flipped the functions order in the trace. It all starts
at

set_mce_nospec(m->addr >> PAGE_SHIFT);

Now it makes sense.

> Note before the first MCE is injected into guest (i.e., after the host MCE
> handler successfully finished MCE handling and notified VMM via SIGBUS with
> BUS_MCEERR_AR), VMM unmaps the guest page from EPT.

Ok, good.

> The guest MCE handler finished the "normal" MCE handling and recovery
> (memory_failure() in mm/memory_failure.cc) successfully, it's the aftermath
> below leading to the stack trace:
> https://github.com/torvalds/linux/blob/5a9ffb954a3933d7867f4341684a23e008d6839b/arch/x86/kernel/cpu/mce/core.c#L1101

On Sun, May 17, 2020 at 08:33:00AM -0700, Jue Wang wrote:
> In other words, it's the *do_memory_failure -> set_mce_nospec* flow of
> guest MCE handler acting on the *first* MCE injection. As a result, the
> guest panics and resets *whenever* there is an MCE injected, even when the
> injected MCE is recoverable.

So IIUC that set_mce_nospec() thing should check whether m->addr is
mapped first and only then mark it _uc and whatever monkey business it
does. Not this blanket

if am I a guest?

test.

Imagine a hypervisor which doesn't set X86_FEATURE_HYPERVISOR, i.e.,
CPUID(1)[EDX, bit 31]?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette