Re: [PATCH v2 03/15] KVM: x86/mmu: Add a mirrored pointer to struct kvm_mmu_page

From: Paolo Bonzini
Date: Thu Jun 06 2024 - 12:04:50 EST


On Thu, May 30, 2024 at 11:07 PM Rick Edgecombe
<rick.p.edgecombe@xxxxxxxxx> wrote:
>
> From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
>
> Add a mirrored pointer to struct kvm_mmu_page for the private page table and
> add helper functions to allocate/initialize/free a private page table page.
> Because KVM TDP MMU doesn't use unsync_children and write_flooding_count,
> pack them to have room for a pointer and use a union to avoid memory
> overhead.
>
> For private GPA, CPU refers to a private page table whose contents are
> encrypted. The dedicated APIs to operate on it (e.g. updating/reading its
> PTE entry) are used, and their cost is expensive.
>
> When KVM resolves the KVM page fault, it walks the page tables. To reuse
> the existing KVM MMU code and mitigate the heavy cost of directly walking
> the private page table, allocate one more page for the mirrored page table
> for the KVM MMU code to directly walk. Resolve the KVM page fault with
> the existing code, and do additional operations necessary for the private
> page table. To distinguish such cases, the existing KVM page table is
> called a shared page table (i.e., not associated with a private page
> table), and the page table with a private page table is called a mirrored
> page table. The relationship is depicted below.
>
> KVM page fault |
> | |
> V |
> -------------+---------- |
> | | |
> V V |
> shared GPA private GPA |
> | | |
> V V |
> shared PT root mirror PT root | private PT root
> | | | |
> V V | V
> shared PT mirror PT --propagate--> private/mirrored PT
> | | | |
> | \-----------------+------\ |
> | | | |
> V | V V
> shared guest page | private guest page
> |
> non-encrypted memory | encrypted memory
> |
> PT: Page table
> Shared PT: visible to KVM, and the CPU uses it for shared mappings.
> Private/mirrored PT: the CPU uses it, but it is invisible to KVM. TDX
> module updates this table to map private guest pages.
> Mirror PT: It is visible to KVM, but the CPU doesn't use it. KVM uses it
> to propagate PT change to the actual private PT.

Which one is the "Mirror" and which one is the "Mirrored" PT is
uncomfortably confusing.

I hate to bikeshed even more, but while I like "Mirror PT" (a lot), I
would stick with "Private" or perhaps "External" for the pages owned
by the TDX module.

> + /*
> + * This cache is to allocate private page table. E.g. private EPT used
> + * by the TDX module.
> + */
> + struct kvm_mmu_memory_cache mmu_mirrored_spt_cache;

So this would be "mmu_external_spt_cache".

> - unsigned int unsync_children;
> + union {
> + /* Those two members aren't used for TDP MMU */

s/Those/These/

> + struct {
> + unsigned int unsync_children;
> + /*
> + * Number of writes since the last time traversal
> + * visited this page.
> + */
> + atomic_t write_flooding_count;
> + };
> + /*
> + * Page table page of private PT.
> + * Passed to TDX module, not accessed by KVM.
> + */
> + void *mirrored_spt;

external_spt

> +static inline void kvm_mmu_alloc_mirrored_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
> +{
> + /*
> + * mirrored_spt is allocated for TDX module to hold private EPT mappings,
> + * TDX module will initialize the page by itself.
> + * Therefore, KVM does not need to initialize or access mirrored_spt.
> + * KVM only interacts with sp->spt for mirrored EPT operations.
> + */
> + sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_mirrored_spt_cache);
> +}
> +
> +static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
> +{
> + /*
> + * private_spt is allocated for TDX module to hold private EPT mappings,
> + * TDX module will initialize the page by itself.
> + * Therefore, KVM does not need to initialize or access private_spt.
> + * KVM only interacts with sp->spt for mirrored EPT operations.
> + */
> + sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_mirrored_spt_cache);
> +}

Duplicate function.

Naming aside, looks good.

Paolo

On Thu, May 30, 2024 at 11:07 PM Rick Edgecombe
<rick.p.edgecombe@xxxxxxxxx> wrote:
>
> From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
>
> Add a mirrored pointer to struct kvm_mmu_page for the private page table and
> add helper functions to allocate/initialize/free a private page table page.
> Because KVM TDP MMU doesn't use unsync_children and write_flooding_count,
> pack them to have room for a pointer and use a union to avoid memory
> overhead.
>
> For private GPA, CPU refers to a private page table whose contents are
> encrypted. The dedicated APIs to operate on it (e.g. updating/reading its
> PTE entry) are used, and their cost is expensive.
>
> When KVM resolves the KVM page fault, it walks the page tables. To reuse
> the existing KVM MMU code and mitigate the heavy cost of directly walking
> the private page table, allocate one more page for the mirrored page table
> for the KVM MMU code to directly walk. Resolve the KVM page fault with
> the existing code, and do additional operations necessary for the private
> page table. To distinguish such cases, the existing KVM page table is
> called a shared page table (i.e., not associated with a private page
> table), and the page table with a private page table is called a mirrored
> page table. The relationship is depicted below.
>
> KVM page fault |
> | |
> V |
> -------------+---------- |
> | | |
> V V |
> shared GPA private GPA |
> | | |
> V V |
> shared PT root mirror PT root | private PT root
> | | | |
> V V | V
> shared PT mirror PT --propagate--> private/mirrored PT
> | | | |
> | \-----------------+------\ |
> | | | |
> V | V V
> shared guest page | private guest page
> |
> non-encrypted memory | encrypted memory
> |
> PT: Page table
> Shared PT: visible to KVM, and the CPU uses it for shared mappings.
> Private/mirrored PT: the CPU uses it, but it is invisible to KVM. TDX
> module updates this table to map private guest pages.
> Mirror PT: It is visible to KVM, but the CPU doesn't use it. KVM uses it
> to propagate PT change to the actual private PT.
>
> Add a helper kvm_has_mirrored_tdp() to trigger this behavior and wire it
> to the TDX vm type.
>
> Co-developed-by: Yan Zhao <yan.y.zhao@xxxxxxxxx>
> Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx>
> Reviewed-by: Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx>
> ---
> TDX MMU Prep v2:
> - Rename private->mirror
> - Don't trigger off of shared mask
>
> TDX MMU Prep:
> - Rename terminology, dummy PT => mirror PT. and updated the commit message
> By Rick and Kai.
> - Added a comment on union of private_spt by Rick.
> - Don't handle the root case in kvm_mmu_alloc_private_spt(), it will not
> be needed in future patches. (Rick)
> - Update comments (Yan)
> - Remove kvm_mmu_init_private_spt(), open code it in later patches (Yan)
>
> v19:
> - typo in the comment in kvm_mmu_alloc_private_spt()
> - drop CONFIG_KVM_MMU_PRIVATE
> ---
> arch/x86/include/asm/kvm_host.h | 5 ++++
> arch/x86/kvm/mmu.h | 5 ++++
> arch/x86/kvm/mmu/mmu.c | 7 +++++
> arch/x86/kvm/mmu/mmu_internal.h | 47 ++++++++++++++++++++++++++++++---
> arch/x86/kvm/mmu/tdp_mmu.c | 1 +
> 5 files changed, 61 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index aabf1648a56a..250899a0239b 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -817,6 +817,11 @@ struct kvm_vcpu_arch {
> struct kvm_mmu_memory_cache mmu_shadow_page_cache;
> struct kvm_mmu_memory_cache mmu_shadowed_info_cache;
> struct kvm_mmu_memory_cache mmu_page_header_cache;
> + /*
> + * This cache is to allocate private page table. E.g. private EPT used
> + * by the TDX module.
> + */
> + struct kvm_mmu_memory_cache mmu_mirrored_spt_cache;
>
> /*
> * QEMU userspace and the guest each have their own FPU state.
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index dc80e72e4848..0c3bf89cf7db 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -318,4 +318,9 @@ static inline gpa_t kvm_translate_gpa(struct kvm_vcpu *vcpu,
> return gpa;
> return translate_nested_gpa(vcpu, gpa, access, exception);
> }
> +
> +static inline bool kvm_has_mirrored_tdp(const struct kvm *kvm)
> +{
> + return kvm->arch.vm_type == KVM_X86_TDX_VM;
> +}
> #endif
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index b97241945596..5070ba7c6e89 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -685,6 +685,12 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
> 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
> if (r)
> return r;
> + if (kvm_has_mirrored_tdp(vcpu->kvm)) {
> + r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_mirrored_spt_cache,
> + PT64_ROOT_MAX_LEVEL);
> + if (r)
> + return r;
> + }
> r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
> PT64_ROOT_MAX_LEVEL);
> if (r)
> @@ -704,6 +710,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache);
> + kvm_mmu_free_memory_cache(&vcpu->arch.mmu_mirrored_spt_cache);
> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
> }
>
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 706f0ce8784c..faef40a561f9 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -101,7 +101,22 @@ struct kvm_mmu_page {
> int root_count;
> refcount_t tdp_mmu_root_count;
> };
> - unsigned int unsync_children;
> + union {
> + /* Those two members aren't used for TDP MMU */
> + struct {
> + unsigned int unsync_children;
> + /*
> + * Number of writes since the last time traversal
> + * visited this page.
> + */
> + atomic_t write_flooding_count;
> + };
> + /*
> + * Page table page of private PT.
> + * Passed to TDX module, not accessed by KVM.
> + */
> + void *mirrored_spt;
> + };
> union {
> struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
> tdp_ptep_t ptep;
> @@ -124,9 +139,6 @@ struct kvm_mmu_page {
> int clear_spte_count;
> #endif
>
> - /* Number of writes since the last time traversal visited this page. */
> - atomic_t write_flooding_count;
> -
> #ifdef CONFIG_X86_64
> /* Used for freeing the page asynchronously if it is a TDP MMU page. */
> struct rcu_head rcu_head;
> @@ -145,6 +157,33 @@ static inline int kvm_mmu_page_as_id(struct kvm_mmu_page *sp)
> return kvm_mmu_role_as_id(sp->role);
> }
>
> +static inline void *kvm_mmu_mirrored_spt(struct kvm_mmu_page *sp)
> +{
> + return sp->mirrored_spt;
> +}
> +
> +static inline void kvm_mmu_alloc_mirrored_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
> +{
> + /*
> + * mirrored_spt is allocated for TDX module to hold private EPT mappings,
> + * TDX module will initialize the page by itself.
> + * Therefore, KVM does not need to initialize or access mirrored_spt.
> + * KVM only interacts with sp->spt for mirrored EPT operations.
> + */
> + sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_mirrored_spt_cache);
> +}
> +
> +static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
> +{
> + /*
> + * private_spt is allocated for TDX module to hold private EPT mappings,
> + * TDX module will initialize the page by itself.
> + * Therefore, KVM does not need to initialize or access private_spt.
> + * KVM only interacts with sp->spt for mirrored EPT operations.
> + */
> + sp->mirrored_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_mirrored_spt_cache);
> +}
> +
> static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
> {
> /*
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 1259dd63defc..e7cd4921afe7 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -53,6 +53,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
>
> static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
> {
> + free_page((unsigned long)sp->mirrored_spt);
> free_page((unsigned long)sp->spt);
> kmem_cache_free(mmu_page_header_cache, sp);
> }
> --
> 2.34.1
>