Re: [PATCH v13 019/113] KVM: TDX: create/destroy VM structure

From: Zhi Wang
Date: Thu Apr 06 2023 - 12:05:32 EST


On Wed, 5 Apr 2023 11:16:18 -0700
Isaku Yamahata <isaku.yamahata@xxxxxxxxx> wrote:

> On Sun, Apr 02, 2023 at 11:41:58AM +0300,
> Zhi Wang <zhi.wang.linux@xxxxxxxxx> wrote:
>
> > > > > +void tdx_mmu_release_hkid(struct kvm *kvm)
> > > > > +{
> > > > > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> > > > > + cpumask_var_t packages;
> > > > > + bool cpumask_allocated;
> > > > > + u64 err;
> > > > > + int ret;
> > > > > + int i;
> > > > > +
> > > > > + if (!is_hkid_assigned(kvm_tdx))
> > > > > + return;
> > > > > +
> > > > > + if (!is_td_created(kvm_tdx))
> > > > > + goto free_hkid;
> > > > > +
> > > > > + cpumask_allocated = zalloc_cpumask_var(&packages, GFP_KERNEL);
> > > > > + cpus_read_lock();
> > > > > + for_each_online_cpu(i) {
> > > > > + if (cpumask_allocated &&
> > > > > + cpumask_test_and_set_cpu(topology_physical_package_id(i),
> > > > > + packages))
> > > > > + continue;
> > > >
> > > > Is this necessary to check cpumask_allocated in the while loop? if cpumask
> > > > is not succefully allocated, wouldn't it be better to bail out just after
> > > > it?
> > >
> > > No because we can't return error here. It's better to do in-efficiently freeing
> > > resources instead of leak.
> > >
> > > We can move the check out of loop. But it would be ugly
> > > (if () {cpu loop} else {cpu loop} ) and this function isn't performance
> > > critical. Also I think it's okay to depend on compiler optimization for loop
> > > invariant. My compiler didn't optimize it in this case, though.
> > >
> >
> > Do you mean the tdh_mng_key_freeid() is still required if failing to allocate
> > the cpumask var and do TDH.PHYMEM_CACHE_WB(WBINVD) on each CPU?
>
> >
> > Out of curiosity, I took a look on the TDX module source code [1], it seems TDX
> > module has an additional check in TDH.MNG.KEY.FREEID. TDH.MNG.VPFLUSHDONE [2]
> > will mark the pending wbinvd in a bitmap:
> >
> > ...
> > /**
> > * Create the WBINVD_BITMAP per-package.
> > * Set to 1 num_of_pkgs bits from the LSB
> > */
> > global_data_ptr->kot.entries[curr_hkid].wbinvd_bitmap = global_data_ptr->pkg_config_bitmap; /* <----HERE */
> >
> > // Set new TD life cycle state
> > tdr_ptr->management_fields.lifecycle_state = TD_BLOCKED;
> >
> > // Set the proper new KOT entry state
> > global_data_ptr->kot.entries[curr_hkid].state = (uint8_t)KOT_STATE_HKID_FLUSHED;
> > ...
> >
> > And TDH.MNG.KEY.FREEID [3] will check if the pending WBINVD has been performed:
> >
> > ...
> > /**
> > * If TDH_PHYMEM_CACHE_WB was executed on all packages/cores,
> > * set the KOT entry, set the KOT entry state to HKID_FREE.
> > */
> > curr_hkid = tdr_ptr->key_management_fields.hkid;
> > tdx_debug_assert(global_data_ptr->kot.entr/ies[curr_hkid].state == KOT_STATE_HKID_FLUSHED);
> > if (global_data_ptr->kot.entries[curr_hkid].wbinvd_bitmap != 0) /* HERE */
> > {
> > TDX_ERROR("CACHEWB is not complete for this HKID (=%x)\n", curr_hkid);
> > return_val = TDX_WBCACHE_NOT_COMPLETE;
> > goto EXIT;
> > }
> > ...
> >
> > Guess the conclusion is: if TDH.PHYMEM.CACHE.WB is not performed on each
> > required CPU correctly, TDH.MNG.KEY.FREEID will fail as well. A leak seems
> > the only option (none of us likes a leak, but...).
>
> Why do we need to leak key? If we fails to allocate cpumask, we can issue
> TDH.PHYMEM.CACHE.WB on all pCPUs instead of all packages.
> If we call TDH.PHYMEM.CACHE.WB multiple times on the same package, it may return
> error. It's benign. It is suboptimal, but it's much better than leaking hkid.

I guess I misunderstood the following sentence in the previous email. Now I
get it. It is a combination of failure-resolving and normal-resolving.

> > We can move the check out of loop. But it would be ugly
> > (if () {cpu loop} else {cpu loop} ) and this function isn't performance
> > critical.