Re: [PATCH v19 037/130] KVM: TDX: Make KVM_CAP_MAX_VCPUS backend specific

From: Huang, Kai
Date: Thu May 09 2024 - 19:20:09 EST

Next message: Stephen Rothwell: "linux-next: manual merge of the mm tree with Linus' tree"
Previous message: Kent Overstreet: "Re: linux-next: manual merge of the refactor-heap tree with the block tree"
In reply to: Sean Christopherson: "Re: [PATCH v19 037/130] KVM: TDX: Make KVM_CAP_MAX_VCPUS backend specific"
Next in thread: Isaku Yamahata: "Re: [PATCH v19 037/130] KVM: TDX: Make KVM_CAP_MAX_VCPUS backend specific"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 10/05/2024 10:52 am, Sean Christopherson wrote:

On Fri, May 10, 2024, Kai Huang wrote:

On 10/05/2024 4:35 am, Sean Christopherson wrote:

KVM x86 limits KVM_MAX_VCPUS to 4096:

config KVM_MAX_NR_VCPUS
int "Maximum number of vCPUs per KVM guest"
depends on KVM
range 1024 4096
default 4096 if MAXSMP
default 1024
help

whereas the limitation from TDX is apprarently simply due to TD_PARAMS taking
a 16-bit unsigned value:

#define TDX_MAX_VCPUS (~(u16)0)

i.e. it will likely be _years_ before TDX's limitation matters, if it ever does.
And _if_ it becomes a problem, we don't necessarily need to have a different
_runtime_ limit for TDX, e.g. TDX support could be conditioned on KVM_MAX_NR_VCPUS
being <= 64k.

Actually later versions of TDX module (starting from 1.5 AFAICT), the module
has a metadata field to report the maximum vCPUs that the module can support
for all TDX guests.

My quick glance at the 1.5 source shows that the limit is still effectively
0xffff, so again, who cares? Assert on 0xffff compile time, and on the reported
max at runtime and simply refuse to use a TDX module that has dropped the minimum
below 0xffff.

I need to double check why this metadata field was added. My concern is in future module versions they may just low down the value.

But another way to handle is we can adjust code when that truly happens? Might not be ideal for stable kernel situation, though?

And we only allow the kvm->max_vcpus to be updated if it's a TDX guest in
the vt_vm_enable_cap(). The reason is we want to avoid unnecessary change
for normal VMX guests.

That's a frankly ridiculous reason to bury code in TDX. Nothing is _forcing_
userspace to set KVM_CAP_MAX_VCPUS, i.e. there won't be any change to VMX VMs
unless userspace _wants_ there to be a change.

Right. Anyway allowing userspace to set KVM_CAP_MAX_VCPUS for non-TDX guests shouldn't have any issue.

The main reason to bury code in TDX is it needs to additionally check tdx_info->max_vcpus_per_td. We can just do in common code if we avoid that TDX specific check.

Next message: Stephen Rothwell: "linux-next: manual merge of the mm tree with Linus' tree"
Previous message: Kent Overstreet: "Re: linux-next: manual merge of the refactor-heap tree with the block tree"
In reply to: Sean Christopherson: "Re: [PATCH v19 037/130] KVM: TDX: Make KVM_CAP_MAX_VCPUS backend specific"
Next in thread: Isaku Yamahata: "Re: [PATCH v19 037/130] KVM: TDX: Make KVM_CAP_MAX_VCPUS backend specific"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]