Re: [PATCH v13 019/113] KVM: TDX: create/destroy VM structure

From: Isaku Yamahata
Date: Wed Apr 05 2023 - 14:16:27 EST


On Sun, Apr 02, 2023 at 11:41:58AM +0300,
Zhi Wang <zhi.wang.linux@xxxxxxxxx> wrote:

> > > > +void tdx_mmu_release_hkid(struct kvm *kvm)
> > > > +{
> > > > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> > > > + cpumask_var_t packages;
> > > > + bool cpumask_allocated;
> > > > + u64 err;
> > > > + int ret;
> > > > + int i;
> > > > +
> > > > + if (!is_hkid_assigned(kvm_tdx))
> > > > + return;
> > > > +
> > > > + if (!is_td_created(kvm_tdx))
> > > > + goto free_hkid;
> > > > +
> > > > + cpumask_allocated = zalloc_cpumask_var(&packages, GFP_KERNEL);
> > > > + cpus_read_lock();
> > > > + for_each_online_cpu(i) {
> > > > + if (cpumask_allocated &&
> > > > + cpumask_test_and_set_cpu(topology_physical_package_id(i),
> > > > + packages))
> > > > + continue;
> > >
> > > Is this necessary to check cpumask_allocated in the while loop? if cpumask
> > > is not succefully allocated, wouldn't it be better to bail out just after
> > > it?
> >
> > No because we can't return error here. It's better to do in-efficiently freeing
> > resources instead of leak.
> >
> > We can move the check out of loop. But it would be ugly
> > (if () {cpu loop} else {cpu loop} ) and this function isn't performance
> > critical. Also I think it's okay to depend on compiler optimization for loop
> > invariant. My compiler didn't optimize it in this case, though.
> >
>
> Do you mean the tdh_mng_key_freeid() is still required if failing to allocate
> the cpumask var and do TDH.PHYMEM_CACHE_WB(WBINVD) on each CPU?

>
> Out of curiosity, I took a look on the TDX module source code [1], it seems TDX
> module has an additional check in TDH.MNG.KEY.FREEID. TDH.MNG.VPFLUSHDONE [2]
> will mark the pending wbinvd in a bitmap:
>
> ...
> /**
> * Create the WBINVD_BITMAP per-package.
> * Set to 1 num_of_pkgs bits from the LSB
> */
> global_data_ptr->kot.entries[curr_hkid].wbinvd_bitmap = global_data_ptr->pkg_config_bitmap; /* <----HERE */
>
> // Set new TD life cycle state
> tdr_ptr->management_fields.lifecycle_state = TD_BLOCKED;
>
> // Set the proper new KOT entry state
> global_data_ptr->kot.entries[curr_hkid].state = (uint8_t)KOT_STATE_HKID_FLUSHED;
> ...
>
> And TDH.MNG.KEY.FREEID [3] will check if the pending WBINVD has been performed:
>
> ...
> /**
> * If TDH_PHYMEM_CACHE_WB was executed on all packages/cores,
> * set the KOT entry, set the KOT entry state to HKID_FREE.
> */
> curr_hkid = tdr_ptr->key_management_fields.hkid;
> tdx_debug_assert(global_data_ptr->kot.entr/ies[curr_hkid].state == KOT_STATE_HKID_FLUSHED);
> if (global_data_ptr->kot.entries[curr_hkid].wbinvd_bitmap != 0) /* HERE */
> {
> TDX_ERROR("CACHEWB is not complete for this HKID (=%x)\n", curr_hkid);
> return_val = TDX_WBCACHE_NOT_COMPLETE;
> goto EXIT;
> }
> ...
>
> Guess the conclusion is: if TDH.PHYMEM.CACHE.WB is not performed on each
> required CPU correctly, TDH.MNG.KEY.FREEID will fail as well. A leak seems
> the only option (none of us likes a leak, but...).

Why do we need to leak key? If we fails to allocate cpumask, we can issue
TDH.PHYMEM.CACHE.WB on all pCPUs instead of all packages.
If we call TDH.PHYMEM.CACHE.WB multiple times on the same package, it may return
error. It's benign. It is suboptimal, but it's much better than leaking hkid.
--
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>