Re: [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest

From: Sean Christopherson
Date: Thu Nov 18 2021 - 10:05:56 EST


On Thu, Nov 18, 2021, Juergen Gross wrote:
> On 17.11.21 21:57, Sean Christopherson wrote:
> > Rather than makes this a module param, I would prefer to start with the below
> > patch (originally from TDX pre-enabling) and then wire up a way for userspace to
> > _lower_ the max on a per-VM basis, e.g. add a capability.
>
> The main reason for this whole series is a request by a partner
> to enable huge VMs on huge machines (huge meaning thousands of
> vcpus on thousands of physical cpus).
>
> Making this large number a compile time setting would hurt all
> the users who have more standard requirements by allocating the
> needed resources even on small systems, so I've switched to a boot
> parameter in order to enable those huge numbers only when required.
>
> With Marc's series to use an xarray for the vcpu pointers only the
> bitmaps for sending IRQs to vcpus are left which need to be sized
> according to the max vcpu limit. Your patch below seems to be fine, but
> doesn't help for that case.

Ah, you want to let userspace define a MAX_VCPUS that goes well beyond the current
limit without negatively impacting existing setups. My idea of a per-VM capability
still works, it would simply require separating the default max from the absolute
max, which this patch mostly does already, it just neglects to set an absolute max.

Which is a good segue into pointing out that if a module param is added, it needs
to be sanity checked against a KVM-defined max. The admin may be trusted to some
extent, but there is zero reason to let userspace set max_vcspus to 4 billion.
At that point, it really is just a param vs. capability question.

I like the idea of a capability because there are already two known use cases,
arm64's GIC and x86's TDX, and it could also be used to reduce the kernel's footprint
for use cases that run large numbers of smaller VMs.

The other alternative would be to turn KVM_MAX_VCPUS into a Kconfig knob. I assume
the partner isn't running a vanilla distro build and could set it as they see fit.