Re: [RFC] KVM: mm: fd-based approach for supporting KVM guest private memory
From: Yu Zhang
Date: Thu Aug 26 2021 - 22:32:01 EST
On Thu, Aug 26, 2021 at 12:15:48PM +0200, David Hildenbrand wrote:
> On 24.08.21 02:52, Sean Christopherson wrote:
> > The goal of this RFC is to try and align KVM, mm, and anyone else with skin in the
> > game, on an acceptable direction for supporting guest private memory, e.g. for
> > Intel's TDX. The TDX architectural effectively allows KVM guests to crash the
> > host if guest private memory is accessible to host userspace, and thus does not
> > play nice with KVM's existing approach of pulling the pfn and mapping level from
> > the host page tables.
> >
> > This is by no means a complete patch; it's a rough sketch of the KVM changes that
> > would be needed. The kernel side of things is completely omitted from the patch;
> > the design concept is below.
> >
> > There's also fair bit of hand waving on implementation details that shouldn't
> > fundamentally change the overall ABI, e.g. how the backing store will ensure
> > there are no mappings when "converting" to guest private.
> >
>
> This is a lot of complexity and rather advanced approaches (not saying they
> are bad, just that we try to teach the whole stack something completely
> new).
>
>
> What I think would really help is a list of requirements, such that
> everybody is aware of what we actually want to achieve. Let me start:
>
> GFN: Guest Frame Number
> EPFN: Encrypted Physical Frame Number
>
>
> 1) An EPFN must not get mapped into more than one VM: it belongs exactly to
> one VM. It must neither be shared between VMs between processes nor between
> VMs within a processes.
>
>
> 2) User space (well, and actually the kernel) must never access an EPFN:
>
> - If we go for an fd, essentially all operations (read/write) have to
> fail.
> - If we have to map an EPFN into user space page tables (e.g., to
> simplify KVM), we could only allow fake swap entries such that "there
> is something" but it cannot be accessed and is flagged accordingly.
> - /proc/kcore and friends have to be careful as well and should not read
> this memory. So there has to be a way to flag these pages.
>
> 3) We need a way to express the GFN<->EPFN mapping and essentially assign an
> EPFN to a GFN.
>
>
> 4) Once we assigned a EPFN to a GFN, that assignment must not longer change.
> Further, an EPFN must not get assigned to multiple GFNs.
>
>
> 5) There has to be a way to "replace" encrypted parts by "shared" parts
> and the other way around.
>
> What else?
Thanks a lot for this summary. A question about the requirement: do we or
do we not have plan to support assigned device to the protected VM?
If yes. The fd based solution may need change the VFIO interface as well(
though the fake swap entry solution need mess with VFIO too). Because:
1> KVM uses VFIO when assigning devices into a VM.
2> Not knowing which GPA ranges may be used by the VM as DMA buffer, all
guest pages will have to be mapped in host IOMMU page table to host pages,
which are pinned during the whole life cycle fo the VM.
3> IOMMU mapping is done during VM creation time by VFIO and IOMMU driver,
in vfio_dma_do_map().
4> However, vfio_dma_do_map() needs the HVA to perform a GUP to get the HPA
and pin the page.
But if we are using fd based solution, not every GPA can have a HVA, thus
the current VFIO interface to map and pin the GPA(IOVA) wont work. And I
doubt if VFIO can be modified to support this easily.
B.R.
Yu