Re: [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits

From: Juergen Gross
Date: Thu Nov 18 2021 - 10:19:22 EST


On 18.11.21 16:09, Sean Christopherson wrote:
On Thu, Nov 18, 2021, Juergen Gross wrote:
On 18.11.21 00:46, Sean Christopherson wrote:
On Wed, Nov 17, 2021, Juergen Gross wrote:
On 16.11.21 15:10, Juergen Gross wrote:
Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
via a #define in a header file.

In order to support higher vcpu-ids without generally increasing the
memory consumption of guests on the host (some guest structures contain
arrays sized by KVM_MAX_VCPU_IDS) add a boot parameter for adding some
bits to the vcpu-id. Additional bits are needed as the vcpu-id is
constructed via bit-wise concatenation of socket-id, core-id, etc.
As those ids maximum values are not always a power of 2, the vcpu-ids
are sparse.

The additional number of bits needed is basically the number of
topology levels with a non-power-of-2 maximum value, excluding the top
most level.

The default value of the new parameter will be 2 in order to support
today's possible topologies. The special value of -1 will use the
number of bits needed for a guest with the current host's topology.

Calculating the maximum vcpu-id dynamically requires to allocate the
arrays using KVM_MAX_VCPU_IDS as the size dynamically.

Signed-of-by: Juergen Gross <jgross@xxxxxxxx>

Just thought about vcpu-ids a little bit more.

It would be possible to replace the topology games completely by an
arbitrary rather high vcpu-id limit (65536?) and to allocate the memory
depending on the max vcpu-id just as needed.

Right now the only vcpu-id dependent memory is for the ioapic consisting
of a vcpu-id indexed bitmap and a vcpu-id indexed byte array (vectors).

We could start with a minimal size when setting up an ioapic and extend
the areas in case a new vcpu created would introduce a vcpu-id outside
the currently allocated memory. Both arrays are protected by the ioapic
specific lock (at least I couldn't spot any unprotected usage when
looking briefly into the code), so reallocating those arrays shouldn't
be hard. In case of ENOMEM the related vcpu creation would just fail.

Thoughts?

Why not have userspace state the max vcpu_id it intends to creates on a per-VM
basis? Same end result, but doesn't require the complexity of reallocating the
I/O APIC stuff.


And if the userspace doesn't do it (like today)?

Similar to my comments in patch 4, KVM's current limits could be used as the
defaults, and any use case wanting to go beyond that would need an updated
userspace. Exceeding those limits today doesn't work, so there's no ABI breakage
by requiring a userspace change.

Hmm, nice idea. Will look into it.

Or again, this could be a Kconfig knob, though that feels a bit weird in this case.
But it might make sense if it can be tied to something in the kernel's config?

Having a Kconfig knob for an absolute upper bound of vcpus should
be fine. If someone doesn't like the capability to explicitly let
qemu create very large VMs, he/she can still set that upper bound
to the normal KVM_MAX_VCPUS value.

Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature