Re: [PATCH v2 06/15] KVM: x86/mmu: Support GFN direct mask

From: Edgecombe, Rick P
Date: Fri Jun 07 2024 - 14:39:35 EST


On Fri, 2024-06-07 at 09:59 +0200, Paolo Bonzini wrote:
> Just one non-cosmetic request at the very end of the email.
>
> On Thu, May 30, 2024 at 11:07 PM Rick Edgecombe
> <rick.p.edgecombe@xxxxxxxxx> wrote:
> > +static inline gfn_t kvm_gfn_root_mask(const struct kvm *kvm, const struct
> > kvm_mmu_page *root)
> > +{
> > +       if (is_mirror_sp(root))
> > +               return 0;
>
> Maybe add a comment:
>
> /*
>  * Since mirror SPs are used only for TDX, which maps private memory
>  * at its "natural" GFN, no mask needs to be applied to them - and, dually,
>  * we expect that the mask is only used for the shared PT.
>  */

Sure, seems like a good idea.

>
> > +       return kvm_gfn_direct_mask(kvm);
>
> Ok, please excuse me again for being fussy on the naming. Typically I
> think of a "mask" as something that you check against, or something
> that you do x &~ mask, not as something that you add. Maybe
> kvm_gfn_root_offset and gfn_direct_offset?
>
> I also thought of gfn_direct_fixed_bits, but I'm not sure it
> translates as well to kvm_gfn_root_fixed_bits. Anyway, I'll leave it
> to you to make a decision, speak up if you think it's not an
> improvement or if (especially for fixed_bits) it results in too long
> lines.
>
> Fortunately this kind of change is decently easy to do with a
> search/replace on the patch files themselves.

Yea, it's no problem to update the code. I'll be happy if this code is more
understandable for non-tdx developers.

As for the name, I guess I'd be less keen on "offset" because it's not clear
that it is a power-of-two value that can be used with bitwise operations.

I'm not sure what the "fixed" adds and it makes it longer. Also, many PTE bits
cannot be moved and they are not referred to as fixed, where the shared bit
actually *can* be moved via GPAW (not that the MMU code cares about that
though).

Just "bits" sounds better to me, so maybe I'll try?
kvm_gfn_direct_bits()
kvm_gfn_root_bits()

>
> > +}
> > +
> >   static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page
> > *sp)
> >   {
> >          /*
> > @@ -359,7 +368,12 @@ static inline int __kvm_mmu_do_page_fault(struct
> > kvm_vcpu *vcpu, gpa_t cr2_or_gp
> >          int r;
> >
> >          if (vcpu->arch.mmu->root_role.direct) {
> > -               fault.gfn = fault.addr >> PAGE_SHIFT;
> > +               /*
> > +                * Things like memslots don't understand the concept of a
> > shared
> > +                * bit. Strip it so that the GFN can be used like normal,
> > and the
> > +                * fault.addr can be used when the shared bit is needed.
> > +                */
> > +               fault.gfn = gpa_to_gfn(fault.addr) &
> > ~kvm_gfn_direct_mask(vcpu->kvm);
> >                  fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
>
> Please add a comment to struct kvm_page_fault's gfn field, about how
> it differs from addr.

Doh, yes totally.

>
> > +       /* Mask applied to convert the GFN to the mapping GPA */
> > +       gfn_t gfn_mask;
>
> s/mask/offset/ or s/mask/fixed_bits/ here, if you go for it; won't
> repeat myself below.
>
> >          /* The level of the root page given to the iterator */
> >          int root_level;
> >          /* The lowest level the iterator should traverse to */
> > @@ -120,18 +122,18 @@ struct tdp_iter {
> >    * Iterates over every SPTE mapping the GFN range [start, end) in a
> >    * preorder traversal.
> >    */
> > -#define for_each_tdp_pte_min_level(iter, root, min_level, start, end) \
> > -       for (tdp_iter_start(&iter, root, min_level, start); \
> > -            iter.valid && iter.gfn < end;                   \
> > +#define for_each_tdp_pte_min_level(iter, kvm, root, min_level, start,
> > end)               \
> > +       for (tdp_iter_start(&iter, root, min_level, start,
> > kvm_gfn_root_mask(kvm, root)); \
> > +            iter.valid && iter.gfn <
> > end;                                                \
> >               tdp_iter_next(&iter))
> >
> > -#define for_each_tdp_pte(iter, root, start, end) \
> > -       for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end)
> > +#define for_each_tdp_pte(iter, kvm, root, start,
> > end)                          \
> > +       for_each_tdp_pte_min_level(iter, kvm, root, PG_LEVEL_4K, start, end)
>
> Maybe add the kvm pointer / remove the mmu pointer in a separate patch
> to make the mask-related changes easier to identify?

Hmm, yea. I can split it.

>
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 7c593a081eba..0e6325b5f5e7 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -13987,6 +13987,16 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu,
> > unsigned int size,
> >   }
> >   EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
> >
> > +#ifdef __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
> > +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn, u64
> > nr_pages)
> > +{
> > +       if (!kvm_x86_ops.flush_remote_tlbs_range ||
> > kvm_gfn_direct_mask(kvm))
>
> I think the code need not check kvm_gfn_direct_mask() here? In the old
> patches that I have it check kvm_gfn_direct_mask() in the vmx/main.c
> callback.

You mean a VMX/TDX implementation of flush_remote_tlbs_range that just returns
-EOPNOTSUPP? Which version of the patches is this? I couldn't find anything like
that.

But I guess we could add one in the later patches. In which case we could drop
the hunk in this one. I see benefit being less churn.


The downside would be wider distribution of the concerns for dealing with
multiple aliases for a GFN. Currently, the behavior to have multiple aliases is
implemented in core MMU code. While it's fine to pollute tdx.c with TDX specific
knowledge of course, removing the handling of this corner from mmu.c might make
it less understandable for non-tdx readers who are working in MMU code.
Basically, if a concept fits into some non-TDX abstraction like this, having it
in core code seems the better default to me.

For this reason, my preference would be to leave the logic in core code. But I'm
fine changing it. I'll move it into the tdx.c for now, unless you are convinced
by the above.