Re: [PATCH 14/21] KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror page table

From: Yan Zhao
Date: Mon Nov 04 2024 - 04:13:10 EST


On Wed, Oct 30, 2024 at 11:03:39AM +0800, Binbin Wu wrote:
> On 9/4/2024 11:07 AM, Rick Edgecombe wrote:
> > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> >
> [...]
> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index 6feb3ab96926..b8cd5a629a80 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -447,6 +447,177 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level)
> > td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa);
> > }
> > +static void tdx_unpin(struct kvm *kvm, kvm_pfn_t pfn)
> > +{
> > + struct page *page = pfn_to_page(pfn);
> > +
> > + put_page(page);
> Nit: It can be
> put_page(pfn_to_page(pfn));
>
Yes, thanks.

>
> > +}
> > +
> > +static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn,
> > + enum pg_level level, kvm_pfn_t pfn)
> > +{
> > + int tdx_level = pg_level_to_tdx_sept_level(level);
> > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> > + hpa_t hpa = pfn_to_hpa(pfn);
> > + gpa_t gpa = gfn_to_gpa(gfn);
> > + u64 entry, level_state;
> > + u64 err;
> > +
> > + err = tdh_mem_page_aug(kvm_tdx, gpa, hpa, &entry, &level_state);
> Nit:
> Usually, kernel prefers to handle and return for error conditions first.
>
> But for this case, for all error conditions, it needs to unpin the page.
> Is it better to return the successful case first, so that it only needs
> to call tdx_unpin() once?
>
> > + if (unlikely(err == TDX_ERROR_SEPT_BUSY)) {
> > + tdx_unpin(kvm, pfn);
> > + return -EAGAIN;
> > + }
As how to handle the busy is not decided up to now, I prefer to leave this hunk
as is.

> > + if (unlikely(err == (TDX_EPT_ENTRY_STATE_INCORRECT | TDX_OPERAND_ID_RCX))) {
> > + if (tdx_get_sept_level(level_state) == tdx_level &&
> > + tdx_get_sept_state(level_state) == TDX_SEPT_PENDING &&
> > + is_last_spte(entry, level) &&
> > + spte_to_pfn(entry) == pfn &&
> > + entry & VMX_EPT_SUPPRESS_VE_BIT) {
> Can this condition be triggered?
> For contention from multiple vCPUs, the winner has frozen the SPTE,
> it shouldn't trigger this.
> Could KVM  do page aug for a same page multiple times somehow?
This condition should not be triggered due to the BUG_ON in
set_external_spte_present().

With Isaku's series [1], this condition will not happen either.

Will remove it. Thanks!


[1] https://lore.kernel.org/all/cover.1728718232.git.isaku.yamahata@xxxxxxxxx/

>
> > + tdx_unpin(kvm, pfn);
> > + return -EAGAIN;
> > + }
> > + }
> > + if (KVM_BUG_ON(err, kvm)) {
> > + pr_tdx_error_2(TDH_MEM_PAGE_AUG, err, entry, level_state);
> > + tdx_unpin(kvm, pfn);
> > + return -EIO;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
> > + enum pg_level level, kvm_pfn_t pfn)
> > +{
> > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> > +
> > + /* TODO: handle large pages. */
> > + if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm))
> > + return -EINVAL;
> > +
> > + /*
> > + * Because guest_memfd doesn't support page migration with
> > + * a_ops->migrate_folio (yet), no callback is triggered for KVM on page
> > + * migration. Until guest_memfd supports page migration, prevent page
> > + * migration.
> > + * TODO: Once guest_memfd introduces callback on page migration,
> > + * implement it and remove get_page/put_page().
> > + */
> > + get_page(pfn_to_page(pfn));
> > +
> > + if (likely(is_td_finalized(kvm_tdx)))
> > + return tdx_mem_page_aug(kvm, gfn, level, pfn);
> > +
> > + /*
> > + * TODO: KVM_MAP_MEMORY support to populate before finalize comes
> > + * here for the initial memory.
> > + */
> > + return 0;
> Is it better to return error before adding the support?
Hmm, returning error is better, though returning 0 is not wrong.
The future tdx_mem_page_record_premap_cnt() just increases
kvm_tdx->nr_premapped, while tdx_vcpu_init_mem_region() can be implemented
without checking kvm_tdx->nr_premapped.


> > +}
> > +
> [...]
>
>