Re: [PATCH v2 07/10] KVM: x86: Always populate L1 GPA for KVM_MAP_MEMORY

From: Isaku Yamahata
Date: Mon Apr 15 2024 - 21:49:44 EST


On Mon, Apr 15, 2024 at 02:17:02PM -0700,
Sean Christopherson <seanjc@xxxxxxxxxx> wrote:

> > > - Return error on guest mode or SMM mode:  Without this patch.
> > >   Pros: No additional patch.
> > >   Cons: Difficult to use.
> >
> > Hmm... For the non-TDX use cases this is just an optimization, right? For TDX
> > there shouldn't be an issue. If so, maybe this last one is not so horrible.
>
> And the fact there are so variables to control (MAXPHADDR, SMM, and guest_mode)
> basically invalidates the argument that returning an error makes the ioctl() hard
> to use. I can imagine it might be hard to squeeze this ioctl() into QEMU's
> existing code, but I don't buy that the ioctl() itself is hard to use.
>
> Literally the only thing userspace needs to do is set CPUID to implicitly select
> between 4-level and 5-level paging. If userspace wants to pre-map memory during
> live migration, or when jump-starting the guest with pre-defined state, simply
> pre-map memory before stuffing guest state. In and of itself, that doesn't seem
> difficult, e.g. at a quick glance, QEMU could add a hook somewhere in
> kvm_vcpu_thread_fn() without too much trouble (though that comes with a huge
> disclaimer that I only know enough about how QEMU manages vCPUs to be dangerous).
>
> I would describe the overall cons for this patch versus returning an error
> differently. Switching MMU state puts the complexity in the kernel. Returning
> an error punts any complexity to userspace. Specifically, anything that KVM can
> do regarding vCPU state to get the right MMU, userspace can do too.
>
> Add on that silently doing things that effectively ignore guest state usually
> ends badly, and I don't see a good argument for this patch (or any variant
> thereof).

Ok, here is a experimental patch on top of the 7/10 to return error. Is this
a direction? or do we want to invoke KVM page fault handler without any check?

I can see the following options.

- Error if vCPU is in SMM mode or guest mode: This patch
Defer the decision until the use cases come up. We can utilize
KVM_CAP_MAP_MEMORY and struct kvm_map_memory.flags for future
enhancement.
Pro: Keep room for future enhancement for unclear use cases to defer
the decision.
Con: The use space VMM has to check/switch the vCPU mode.

- No check of vCPU mode and go on
Pro: It works.
Con: Unclear how the uAPI should be without concrete use cases.

- Always populate with L1 GPA:
This is a bad idea.

---
arch/x86/kvm/x86.c | 32 +++++++++-----------------------
1 file changed, 9 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8ba9c1720ac9..2f3ceda5c225 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5871,10 +5871,8 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
int kvm_arch_vcpu_map_memory(struct kvm_vcpu *vcpu,
struct kvm_memory_mapping *mapping)
{
- struct kvm_mmu *mmu = NULL, *walk_mmu = NULL;
u64 end, error_code = 0;
u8 level = PG_LEVEL_4K;
- bool is_smm;
int r;

/*
@@ -5884,25 +5882,21 @@ int kvm_arch_vcpu_map_memory(struct kvm_vcpu *vcpu,
if (!tdp_enabled)
return -EOPNOTSUPP;

- /* Force to use L1 GPA despite of vcpu MMU mode. */
- is_smm = !!(vcpu->arch.hflags & HF_SMM_MASK);
- if (is_smm ||
- vcpu->arch.mmu != &vcpu->arch.root_mmu ||
- vcpu->arch.walk_mmu != &vcpu->arch.root_mmu) {
- vcpu->arch.hflags &= ~HF_SMM_MASK;
- mmu = vcpu->arch.mmu;
- walk_mmu = vcpu->arch.walk_mmu;
- vcpu->arch.mmu = &vcpu->arch.root_mmu;
- vcpu->arch.walk_mmu = &vcpu->arch.root_mmu;
- kvm_mmu_reset_context(vcpu);
- }
+ /*
+ * SMM mode results in populating SMM memory space with memslots id = 1.
+ * guest mode results in populating with L2 GPA.
+ * Don't support those cases for now and punt them for the future
+ * discussion.
+ */
+ if (is_smm(vcpu) || is_guest_mode(vcpu))
+ return -EOPNOTSUPP;

/* reload is optimized for repeated call. */
kvm_mmu_reload(vcpu);

r = kvm_tdp_map_page(vcpu, mapping->base_address, error_code, &level);
if (r)
- goto out;
+ return r;

/* mapping->base_address is not necessarily aligned to level-hugepage. */
end = (mapping->base_address & KVM_HPAGE_MASK(level)) +
@@ -5910,14 +5904,6 @@ int kvm_arch_vcpu_map_memory(struct kvm_vcpu *vcpu,
mapping->size -= end - mapping->base_address;
mapping->base_address = end;

-out:
- /* Restore MMU state. */
- if (is_smm || mmu) {
- vcpu->arch.hflags |= is_smm ? HF_SMM_MASK : 0;
- vcpu->arch.mmu = mmu;
- vcpu->arch.walk_mmu = walk_mmu;
- kvm_mmu_reset_context(vcpu);
- }
return r;
}

--
2.43.2

--
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>