Re: [RFC PATCH 0/8] KVM: x86/mmu: Introduce pinned SPTEs framework

From: Sean Christopherson
Date: Mon Aug 03 2020 - 13:16:22 EST


On Mon, Aug 03, 2020 at 10:52:05AM -0500, Brijesh Singh wrote:
> Thanks for series Sean. Some thoughts
>
>
> On 7/31/20 4:23 PM, Sean Christopherson wrote:
> > SEV currently needs to pin guest memory as it doesn't support migrating
> > encrypted pages. Introduce a framework in KVM's MMU to support pinning
> > pages on demand without requiring additional memory allocations, and with
> > (somewhat hazy) line of sight toward supporting more advanced features for
> > encrypted guest memory, e.g. host page migration.
>
>
> Eric's attempt to do a lazy pinning suffers with the memory allocation
> problem and your series seems to address it. As you have noticed,
> currently the SEV enablement  in the KVM does not support migrating the
> encrypted pages. But the recent SEV firmware provides a support to
> migrate the encrypted pages (e.g host page migration). The support is
> available in SEV FW >= 0.17.

I assume SEV also doesn't support ballooning? Ballooning would be a good
first step toward page migration as I think it'd be easier for KVM to
support, e.g. only needs to deal with the "zap" and not the "move".

> > The idea is to use a software available bit in the SPTE to track that a
> > page has been pinned. The decision to pin a page and the actual pinning
> > managment is handled by vendor code via kvm_x86_ops hooks. There are
> > intentionally two hooks (zap and unzap) introduced that are not needed for
> > SEV. I included them to again show how the flag (probably renamed?) could
> > be used for more than just pin/unpin.
>
> If using the available software bits for the tracking the pinning is
> acceptable then it can be used for the non-SEV guests (if needed). I
> will look through your patch more carefully but one immediate question,
> when do we unpin the pages? In the case of the SEV, once a page is
> pinned then it should not be unpinned until the guest terminates. If we
> unpin the page before the VM terminates then there is a  chance the host
> page migration will kick-in and move the pages. The KVM MMU code may
> call to drop the spte's during the zap/unzap and this happens a lot
> during a guest execution and it will lead us to the path where a vendor
> specific code will unpin the pages during the guest execution and cause
> a data corruption for the SEV guest.

The pages are unpinned by:

drop_spte()
|
-> rmap_remove()
|
-> sev_drop_pinned_spte()


The intent is to allow unpinning pages when the mm_struct dies, i.e. when
the memory is no longer reachable (as opposed to when the last reference to
KVM is put), but typing that out, I realize there are dependencies and
assumptions that don't hold true for SEV as implemented.

- Parent shadow pages won't be zapped. Recycling MMU pages and zapping
all SPs due to memslot updates are the two concerns.

The easy way out for recycling is to not recycle SPs with pinned
children, though that may or may not fly with VMM admins.

I'm trying to resolve the memslot issue[*], but confirming that there's
no longer an issue with not zapping everything is proving difficult as
we haven't yet reproduced the original bug.

- drop_large_spte() won't be invoked. I believe the only semi-legitimate
scenario is if the NX huge page workaround is toggled on while a VM is
running. Disallowing that if there is an SEV guest seems reasonable?

There might be an issue with the host page size changing, but I don't
think that can happen if the page is pinned. That needs more
investigation.


[*] https://lkml.kernel.org/r/20200703025047.13987-1-sean.j.christopherson@xxxxxxxxx

> > Bugs in the core implementation are pretty much guaranteed. The basic
> > concept has been tested, but in a fairly different incarnation. Most
> > notably, tagging PRESENT SPTEs as PINNED has not been tested, although
> > using the PINNED flag to track zapped (and known to be pinned) SPTEs has
> > been tested. I cobbled this variation together fairly quickly to get the
> > code out there for discussion.
> >
> > The last patch to pin SEV pages during sev_launch_update_data() is
> > incomplete; it's there to show how we might leverage MMU-based pinning to
> > support pinning pages before the guest is live.
>
>
> I will add the SEV specific bits and  give this a try.
>
> >
> > Sean Christopherson (8):
> > KVM: x86/mmu: Return old SPTE from mmu_spte_clear_track_bits()
> > KVM: x86/mmu: Use bits 2:0 to check for present SPTEs
> > KVM: x86/mmu: Refactor handling of not-present SPTEs in mmu_set_spte()
> > KVM: x86/mmu: Add infrastructure for pinning PFNs on demand
> > KVM: SVM: Use the KVM MMU SPTE pinning hooks to pin pages on demand
> > KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault()
> > KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by SEV
> > KVM: SVM: Pin SEV pages in MMU during sev_launch_update_data()
> >
> > arch/x86/include/asm/kvm_host.h | 7 ++
> > arch/x86/kvm/mmu.h | 3 +
> > arch/x86/kvm/mmu/mmu.c | 186 +++++++++++++++++++++++++-------
> > arch/x86/kvm/mmu/paging_tmpl.h | 3 +-
> > arch/x86/kvm/svm/sev.c | 141 +++++++++++++++++++++++-
> > arch/x86/kvm/svm/svm.c | 3 +
> > arch/x86/kvm/svm/svm.h | 3 +
> > 7 files changed, 302 insertions(+), 44 deletions(-)
> >