Re: [RFCv2 00/16] KVM protected memory extension

From: Andy Lutomirski
Date: Mon Oct 26 2020 - 19:58:54 EST


On Mon, Oct 26, 2020 at 8:29 AM Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> wrote:
>
> On Wed, Oct 21, 2020 at 11:20:56AM -0700, Andy Lutomirski wrote:
> > > On Oct 19, 2020, at 11:19 PM, Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> wrote:
> >
> > > For removing the userspace mapping, use a trick similar to what NUMA
> > > balancing does: convert memory that belongs to KVM memory slots to
> > > PROT_NONE: all existing entries converted to PROT_NONE with mprotect() and
> > > the newly faulted in pages get PROT_NONE from the updated vm_page_prot.
> > > The new VMA flag -- VM_KVM_PROTECTED -- indicates that the pages in the
> > > VMA must be treated in a special way in the GUP and fault paths. The flag
> > > allows GUP to return the page even though it is mapped with PROT_NONE, but
> > > only if the new GUP flag -- FOLL_KVM -- is specified. Any userspace access
> > > to the memory would result in SIGBUS. Any GUP access without FOLL_KVM
> > > would result in -EFAULT.
> > >
> >
> > I definitely like the direction this patchset is going in, and I think
> > that allowing KVM guests to have memory that is inaccessible to QEMU
> > is a great idea.
> >
> > I do wonder, though: do we really want to do this with these PROT_NONE
> > tricks, or should we actually come up with a way to have KVM guest map
> > memory that isn't mapped into QEMU's mm_struct at all? As an example
> > of the latter, I mean something a bit like this:
> >
> > https://lkml.kernel.org/r/CALCETrUSUp_7svg8EHNTk3nQ0x9sdzMCU=h8G-Sy6=SODq5GHg@xxxxxxxxxxxxxx
> >
> > I don't mean to say that this is a requirement of any kind of
> > protected memory like this, but I do think we should understand the
> > tradeoffs, in terms of what a full implementation looks like, the
> > effort and time frames involved, and the maintenance burden of
> > supporting whatever gets merged going forward.
>
> I considered the PROT_NONE trick neat. Complete removing of the mapping
> from QEMU would require more changes into KVM and I'm not really familiar
> with it.

I think it's neat. The big tradeoff I'm concerned about is that it
will likely become ABI once it lands. That is, if this series lands,
then we will always have to support the case in which QEMU has a
special non-present mapping that is nonetheless reflected as present
in a guest. This is a bizarre state of affairs, it may become
obsolete if a better API ever shows up, and it might end up placing
constraints on the Linux VM that we don't love going forward.

I don't think my proposal in the referenced thread above is that crazy
or that difficult to implement. The basic idea is to have a way to
create an mm_struct that is not loaded in CR3 anywhere. Instead, KVM
will reference it, much as it currently references QEMU's mm_struct,
to mirror mappings into the guest. This means it would be safe to
have "protected" memory mapped into the special mm_struct because
nothing other than KVM will ever reference the PTEs. But I think that
someone who really understands the KVM memory mapping code should
chime in.

>
> About tradeoffs: the trick interferes with AutoNUMA. I didn't put much
> thought into how we can get it work together. Need to look into it.
>
> Do you see other tradeoffs?
>
> --
> Kirill A. Shutemov