Re: [PATCH 11/11] KVM: x86: Add gmem hook for determining max NPT mapping level

From: Michael Roth
Date: Tue Apr 09 2024 - 19:47:27 EST


On Thu, Apr 04, 2024 at 02:50:33PM -0400, Paolo Bonzini wrote:
> From: Michael Roth <michael.roth@xxxxxxx>
>
> In the case of SEV-SNP, whether or not a 2MB page can be mapped via a
> 2MB mapping in the guest's nested page table depends on whether or not
> any subpages within the range have already been initialized as private
> in the RMP table. The existing mixed-attribute tracking in KVM is
> insufficient here, for instance:
>
> - gmem allocates 2MB page
> - guest issues PVALIDATE on 2MB page
> - guest later converts a subpage to shared
> - SNP host code issues PSMASH to split 2MB RMP mapping to 4K
> - KVM MMU splits NPT mapping to 4K

Binbin caught that I'd neglected to document the last step in the
theoretical sequence here. It should state something to the effect
of:

- guest later converts that shared page back to private

-Mike

>
> At this point there are no mixed attributes, and KVM would normally
> allow for 2MB NPT mappings again, but this is actually not allowed
> because the RMP table mappings are 4K and cannot be promoted on the
> hypervisor side, so the NPT mappings must still be limited to 4K to
> match this.
>
> Add a hook to determine the max NPT mapping size in situations like
> this.
>
> Signed-off-by: Michael Roth <michael.roth@xxxxxxx>
> Message-Id: <20231230172351.574091-31-michael.roth@xxxxxxx>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---
> arch/x86/include/asm/kvm-x86-ops.h | 1 +
> arch/x86/include/asm/kvm_host.h | 2 ++
> arch/x86/kvm/mmu/mmu.c | 8 ++++++++
> 3 files changed, 11 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index c81990937ab4..2db87a6fd52a 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -140,6 +140,7 @@ KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> KVM_X86_OP_OPTIONAL(get_untagged_addr)
> KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
> KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
> +KVM_X86_OP_OPTIONAL_RET0(gmem_validate_fault)
> KVM_X86_OP_OPTIONAL(gmem_invalidate)
>
> #undef KVM_X86_OP
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 59c7b95034fc..67dc108dd366 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1811,6 +1811,8 @@ struct kvm_x86_ops {
> void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
> int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
> void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
> + int (*gmem_validate_fault)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, bool is_private,
> + u8 *max_level);
> };
>
> struct kvm_x86_nested_ops {
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 992e651540e8..13dd367b8af1 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4338,6 +4338,14 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
> fault->max_level);
> fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
>
> + r = static_call(kvm_x86_gmem_validate_fault)(vcpu->kvm, fault->pfn,
> + fault->gfn, fault->is_private,
> + &fault->max_level);
> + if (r) {
> + kvm_release_pfn_clean(fault->pfn);
> + return r;
> + }
> +
> return RET_PF_CONTINUE;
> }
>
> --
> 2.43.0
>