Re: [PATCH v9 016/105] KVM: TDX: create/destroy VM structure

From: Isaku Yamahata
Date: Thu Oct 13 2022 - 04:56:04 EST


On Wed, Oct 12, 2022 at 03:30:26PM -0700,
Sagi Shahar <sagis@xxxxxxxxxx> wrote:


> > +int tdx_vm_init(struct kvm *kvm)
> > +{
> > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> > + cpumask_var_t packages;
> > + int ret, i;
> > + u64 err;
> > +
> > + /* vCPUs can't be created until after KVM_TDX_INIT_VM. */
> > + kvm->max_vcpus = 0;
>
> The fact that vCPUs can't be created until KVM_TDX_INIT_VM is called
> will make it difficult to implement intra host migration. See longer
> discussion below.
...
> Me, Sean and Isaku had a short discussion offline regarding the
> interaction between the proposed API in this patch and intra-host
> migration. To summarize:
>
> For intra-host migration you generally want the destination VM to be
> initialized including the right number of vCPUs before you migrate the
> source VM state into it.
> The proposed API makes it difficult since it forces the destination VM
> to call KVM_TDX_INIT_VM before creating vCPUs which initializes TDX
> state and allocate a new hkid for the destination VM which would never
> be used. This can create a resource limitation on migrating VMs where
> there shouldn't be one.
>
> To solve this issue there are 2 main proposed changes to the API:
>
> 1. Add a new API based on ioctl(KVM_ENABLE_CAP) to let userspace
> modify the max number of vcpus:
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 43a6a7efc6ec..6055098b025b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6278,6 +6278,18 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> }
> mutex_unlock(&kvm->lock);
> break;
> + case KVM_CAP_MAX_VCPUS:
> + r = -EINVAL;
> + if (cap->args[0] > KVM_MAX_VCPUS)
> + break;
> +
> + mutex_lock(&kvm->lock);
> + if (!kvm->created_vcpus) {
> + kvm->max_vcpus = cap->args[0];
> + r = 0;
> + }
> + mutex_unlock(&kvm->lock);
> + break;
> case KVM_CAP_MAX_VCPU_ID:
> r = -EINVAL;
> if (cap->args[0] > KVM_MAX_VCPU_IDS)
>
> 2. Modify the existing API such that max_vcpus will be set to
> KVM_MAX_VCPUS like in regular VMs and during KVM_TDX_INIT_VM, if the
> user created more vCPUs than the number specified, KVM_TDX_INIT_VM
> will fail.
>
> For option (1), there are some possible variations:
> 1.a. Do we keep the max_vcpus argument in KVM_TDX_INIT_VM? If so, we
> need to check if max_vcpus matches the number of max_vcpus already set
> and fail otherwise.
> 1.b. Do we require KVM_ENABLE_CAP_VM(KVM_CAP_MAX_VCPUS) to be called?
> Theoretically, we can set max_vcpus to the KVM default KVM_MAX_VCPUS
> and allow the user to change it as long as vcpus hasn't been created.
> If KVM_ENABLE_CAP_VM(KVM_CAP_MAX_VCPUS), the behavior will remain the
> same as regular VMs right now.
>
> In my opinion, the cleanest solution would be option 1 (new
> KVM_CAP_MAX_VCPUS API) while removing the max_vcpus argument from
> KVM_TDX_INIT_VM and setting the initial max_vcpus to KVM_MAX_VCPUS and
> not requiring the new ioctl to be called unless userspace wants to
> specifically limit the number of vcpus. In that case,
> KVM_CAP_MAX_VCPUS can be called at any time until vcpus are created.

Regarding to KVM_CAP_MAX_CPUS vs KVM_TDX_INIT_VM, KVM_CAP_MAX_CPUS is more
generic, KVM_CAP_MAX_CPUS would be better. This follows tsc frequency.

If option (1) is adapted, the logic should go to the common code, i.e. under
linux/virt/kvm/, because there is nothing specific to x86. I don't see any use
case other than TDX, though.
--
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>