Re: [PATCH v2 06/15] KVM: x86/mmu: Support GFN direct mask

From: Paolo Bonzini
Date: Fri Jun 07 2024 - 03:59:49 EST


> Keep TDX specific concepts out of the MMU code by not
> naming it "shared".

I think that, more than keeping TDX specific concepts out of MMU code,
it is better to have a different name because it doesn't confuse
memory attributes with MMU concepts.

For example, SNP uses the same page tables for both shared and private
memory, as it handles them at the RMP level.

By the way, in patch 3 it still talks about "shared PT", please change
that to "direct SP" (so we have "direct", "external", "mirror").

Just one non-cosmetic request at the very end of the email.

On Thu, May 30, 2024 at 11:07 PM Rick Edgecombe
<rick.p.edgecombe@xxxxxxxxx> wrote:
> +static inline gfn_t kvm_gfn_root_mask(const struct kvm *kvm, const struct kvm_mmu_page *root)
> +{
> + if (is_mirror_sp(root))
> + return 0;

Maybe add a comment:

/*
* Since mirror SPs are used only for TDX, which maps private memory
* at its "natural" GFN, no mask needs to be applied to them - and, dually,
* we expect that the mask is only used for the shared PT.
*/

> + return kvm_gfn_direct_mask(kvm);

Ok, please excuse me again for being fussy on the naming. Typically I
think of a "mask" as something that you check against, or something
that you do x &~ mask, not as something that you add. Maybe
kvm_gfn_root_offset and gfn_direct_offset?

I also thought of gfn_direct_fixed_bits, but I'm not sure it
translates as well to kvm_gfn_root_fixed_bits. Anyway, I'll leave it
to you to make a decision, speak up if you think it's not an
improvement or if (especially for fixed_bits) it results in too long
lines.

Fortunately this kind of change is decently easy to do with a
search/replace on the patch files themselves.

> +}
> +
> static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
> {
> /*
> @@ -359,7 +368,12 @@ static inline int __kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gp
> int r;
>
> if (vcpu->arch.mmu->root_role.direct) {
> - fault.gfn = fault.addr >> PAGE_SHIFT;
> + /*
> + * Things like memslots don't understand the concept of a shared
> + * bit. Strip it so that the GFN can be used like normal, and the
> + * fault.addr can be used when the shared bit is needed.
> + */
> + fault.gfn = gpa_to_gfn(fault.addr) & ~kvm_gfn_direct_mask(vcpu->kvm);
> fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);

Please add a comment to struct kvm_page_fault's gfn field, about how
it differs from addr.

> + /* Mask applied to convert the GFN to the mapping GPA */
> + gfn_t gfn_mask;

s/mask/offset/ or s/mask/fixed_bits/ here, if you go for it; won't
repeat myself below.

> /* The level of the root page given to the iterator */
> int root_level;
> /* The lowest level the iterator should traverse to */
> @@ -120,18 +122,18 @@ struct tdp_iter {
> * Iterates over every SPTE mapping the GFN range [start, end) in a
> * preorder traversal.
> */
> -#define for_each_tdp_pte_min_level(iter, root, min_level, start, end) \
> - for (tdp_iter_start(&iter, root, min_level, start); \
> - iter.valid && iter.gfn < end; \
> +#define for_each_tdp_pte_min_level(iter, kvm, root, min_level, start, end) \
> + for (tdp_iter_start(&iter, root, min_level, start, kvm_gfn_root_mask(kvm, root)); \
> + iter.valid && iter.gfn < end; \
> tdp_iter_next(&iter))
>
> -#define for_each_tdp_pte(iter, root, start, end) \
> - for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end)
> +#define for_each_tdp_pte(iter, kvm, root, start, end) \
> + for_each_tdp_pte_min_level(iter, kvm, root, PG_LEVEL_4K, start, end)

Maybe add the kvm pointer / remove the mmu pointer in a separate patch
to make the mask-related changes easier to identify?

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7c593a081eba..0e6325b5f5e7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13987,6 +13987,16 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
> }
> EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
>
> +#ifdef __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
> +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn, u64 nr_pages)
> +{
> + if (!kvm_x86_ops.flush_remote_tlbs_range || kvm_gfn_direct_mask(kvm))

I think the code need not check kvm_gfn_direct_mask() here? In the old
patches that I have it check kvm_gfn_direct_mask() in the vmx/main.c
callback.

Paolo