Re: [PATCH] x86/mm: Don't try to change poison pages to uncacheable in a guest
From: Sean Christopherson
Date: Tue May 19 2020 - 01:04:40 EST
On Mon, May 18, 2020 at 06:55:00PM +0200, Borislav Petkov wrote:
> On Mon, May 18, 2020 at 08:36:25AM -0700, Luck, Tony wrote:
> > The VMM gets the page fault (because the unmapping of the guest
> > physical address is at the VMM EPT level). The VMM can't map a new
> > page into that guest physical address because it has no way to
> > replace the contents of the old page. The VMM could pass the #PF
> > to the guest, but that would just confuse the guest (its page tables
> > all say that the page is still valid). In this particular case the
> > page is part of the 1:1 kernel map. So the kernel will OOPS (I think).
>
> ...
>
> > PLease explain how a guest (that doesn't even know that it is a guest)
> > is going to figure out that the EPT tables (that it has no way to access)
> > have marked this page invalid in guest physical address space.
>
> So somewhere BUS_MCEERR_AR was mentioned. So I'm assuming the error
> severity was "action required". What does happen in the kernel, on
> baremetal, with an AR error in kernel space, i.e., kernel memory?
>
> If we can't fixup the exception, we die.
>
> So why should the guest behave any differently?
>
> Now, if you want for the guest to be more "robust" and handle that
> thing, fine. But then you'd need an explicit way to tell the guest
> kernel: "you've just had an MCE and I unmapped the page" so that the
> guest kernel can figure out what do to. Even if it means, to panic.
>
> I.e., signal in an explicit way that EPT violation Jue is talking about
> in the other mail.
Well, technically the CLFUSH thing is a KVM emulation bug, but it sounds
like that's a moot point since the pmem-enabled guest will make real
accesses to the poisoned page shortly thereafter. E.g. teaching KVM to
eat the -EHWPOISON on CLFLUSH would only postpone the guest's death.
As for how the second #MC occurs, on the EPT violation, KVM does a gup() to
translate the virtual address to a pfn (KVM maintains a simple GPA->HVA
lookup). gup() returns -EHWPOISON for the poisoned page, which KVM
redirects into a BUS_MCEERR_AR. The userspace VMM, e.g. Qemu, sees the
BUS_MCEERR_AR and sends it back into the guest as a virtual #MC.
> You can inject a #PF or better yet the *first* MCE which is being
> injected should say with a bit somehwere "I unmapped the address in
> m->addr". So that the guest kernel can handle that properly and know
> what *exactly* it is getting an MCE for.
>
> What I don't like is the "am I running as a guest" check. Because
> someone else would come later and say, err, I'm not virtualizing this
> portion of MCA either, lemme add another "am I guest" check.
>
> Sure, it is a lot easier but when stuff like that starts spreading
> around in the MCE code, then we can just as well disable MCE when
> virtualized altogether. It would be a lot easier for everybody.