Re: [RFC PATCH 6/8] KVM: x86: Implement kvm_arch_{, pre_}vcpu_map_memory()

From: Sean Christopherson
Date: Wed Apr 03 2024 - 19:15:46 EST


On Tue, Mar 19, 2024, Isaku Yamahata wrote:
> On Wed, Mar 06, 2024 at 05:51:51PM -0800,
> > Yes. We'd like to map exact gpa range for SNP or TDX case. We don't want to map
> > zero at around range. For SNP or TDX, we map page to GPA, it's one time
> > operation. It updates measurement.
> >
> > Say, we'd like to populate GPA1 and GPA2 with initial guest memory image. And
> > they are within same 2M range. Map GPA1 first. If GPA2 is also mapped with zero
> > with 2M page, the following mapping of GPA2 fails. Even if mapping of GPA2
> > succeeds, measurement may be updated when mapping GPA1.
> >
> > It's user space VMM responsibility to map GPA range only once at most for SNP or
> > TDX. Is this too strict requirement for default VM use case to mitigate KVM
> > page fault at guest boot up? If so, what about a flag like EXACT_MAPPING or
> > something?
>
> I'm thinking as follows. What do you think?
>
> - Allow mapping larger than requested with gmem_max_level hook:

I don't see any reason to allow userspace to request a mapping level. If the
prefetch is defined to have read fault semantics, KVM has all the wiggle room it
needs to do the optimal/sane thing, without having to worry reconcile userspace's
desired mapping level.

> Depend on the following patch. [1]
> The gmem_max_level hook allows vendor-backend to determine max level.
> By default (for default VM or sw-protected), it allows KVM_MAX_HUGEPAGE_LEVEL
> mapping. TDX allows only 4KB mapping.
>
> [1] https://lore.kernel.org/kvm/20231230172351.574091-31-michael.roth@xxxxxxx/
> [PATCH v11 30/35] KVM: x86: Add gmem hook for determining max NPT mapping level
>
> - Pure mapping without coco operation:
> As Sean suggested at [2], make KVM_MAP_MEMORY pure mapping without coco
> operation. In the case of TDX, the API doesn't issue TDX specific operation
> like TDH.PAGE.ADD() and TDH.EXTEND.MR(). We need TDX specific API.
>
> [2] https://lore.kernel.org/kvm/Ze-XW-EbT9vXaagC@xxxxxxxxxx/
>
> - KVM_MAP_MEMORY on already mapped area potentially with large page:
> It succeeds. Not error. It doesn't care whether the GPA is backed by large
> page or not. Because the use case is pre-population before guest running, it
> doesn't matter if the given GPA was mapped or not, and what large page level
> it backs.
>
> Do you want error like -EEXIST?

No error. As above, I think the ioctl() should behave like a read fault, i.e.
be an expensive nop if there's nothing to be done.

For VMA-based memory, userspace can operate on the userspace address. E.g. if
userspace wants to break CoW, it can do that by writing from userspace. And if
userspace wants to "request" a certain mapping level, it can do that by MADV_*.

For guest_memfd, there are no protections (everything is RWX, for now), and when
hugepage support comes along, userspace can simply manipulate the guest_memfd
instance as needed.