Re: [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching

From: Paolo Bonzini
Date: Fri May 17 2024 - 13:58:10 EST


On Fri, May 17, 2024 at 7:39 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> * Disallow KVM_CAP_X86_DISABLE_EXITS after vCPU creation
> * Reject disabling of MWAIT/HLT interception when not allowed
> * Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID.

This is technically a breaking change, and it's even documented in
api.rst under "KVM_GET_SUPPORTED_CPUID issues":

---
CPU[EAX=1]:ECX[21] (X2APIC) is reported by
``KVM_GET_SUPPORTED_CPUID``, but it can only be enabled if
``KVM_CREATE_IRQCHIP`` or ``KVM_ENABLE_CAP(KVM_CAP_IRQCHIP_SPLIT)``
are used to enable in-kernel emulation of the local APIC.

The same is true for the ``KVM_FEATURE_PV_UNHALT`` paravirtualized feature.

CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by
``KVM_GET_SUPPORTED_CPUID``. It can be enabled if
``KVM_CAP_TSC_DEADLINE_TIMER`` is present and the kernel has enabled
in-kernel emulation of the local APIC.
---

However I think we can get away with it. QEMU source code on one hand does

/* tsc-deadline flag is not returned by GET_SUPPORTED_CPUID, but it
* can be enabled if the kernel has KVM_CAP_TSC_DEADLINE_TIMER,
* and the irqchip is in the kernel.
*/
if (kvm_irqchip_in_kernel() &&
kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
ret |= CPUID_EXT_TSC_DEADLINE_TIMER;
}

/* x2apic is reported by GET_SUPPORTED_CPUID, but it can't be enabled
* without the in-kernel irqchip
*/
if (!kvm_irqchip_in_kernel()) {
ret &= ~CPUID_EXT_X2APIC;
}

so it has to cope with existing mess but it's not expecting the
opposite mess (understandable).

However, in practice userspace APIC has always been utterly broken and
even deprecated in QEMU, so we might get away with it. I don't see why
one would use no kernel APIC unless the guest has no APIC whatsoever.

And no guest that doesn't find an APIC is going to use the TSC
deadline timer (sure the MSR is outside x2APIC space but how in the
world would you configure LVTT), likewise for X2APIC since you need to
turn it on at 0xFEE0_0000 first.

Paolo