Re: [PATCH 25/25] KVM: x86: Add CPUID bits missing from KVM_GET_SUPPORTED_CPUID

From: Paolo Bonzini
Date: Thu Sep 12 2024 - 10:09:50 EST


On Thu, Sep 12, 2024 at 9:48 AM Xiaoyao Li <xiaoyao.li@xxxxxxxxx> wrote:
> On 9/11/2024 1:52 AM, Paolo Bonzini wrote:
> > On 8/13/24 00:48, Rick Edgecombe wrote:
> >> For KVM_TDX_CAPABILITIES, there was also the problem of bits that are
> >> actually supported by KVM, but missing from get_supported_cpuid() for one
> >> reason or another. These include X86_FEATURE_MWAIT, X86_FEATURE_HT and
> >> X86_FEATURE_TSC_DEADLINE_TIMER. This is currently worked around in
> >> QEMU by
> >> adjusting which features are expected.
>
> I'm not sure what issue/problem can be worked around in QEMU.
> QEMU doesn't expect these bit are reported by KVM as supported for TDX.
> QEMU just accepts the result reported by KVM.

QEMU already adds some extra bits, for example:

ret |= CPUID_EXT_HYPERVISOR;
if (kvm_irqchip_in_kernel() &&
kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
ret |= CPUID_EXT_TSC_DEADLINE_TIMER;
}

> The problem is, TDX module and the hardware allow these bits be
> configured for TD guest, but KVM doesn't allow. It leads to users cannot
> create a TD with these bits on.

KVM is not going to have any checks, it's only going to pass the
CPUID to the TDX module and return an error if the check fails
in the TDX module.

KVM can have a TDX-specific version of KVM_GET_SUPPORTED_CPUID, so
that we can keep a variant of the "get supported bits and pass them
to KVM_SET_CPUID2" logic, but that's it.

> > This is the kind of API that we need to present for TDX, even if the
> > details on how to get the supported CPUID are different. Not because
> > it's a great API, but rather because it's a known API.
>
> However there are differences for TDX. For legacy VMs, the result of
> KVM_GET_SUPPORTED_CPUID isn't used to filter the input of KVM_SET_CPUID2.
> But for TDX, it needs to filter the input of KVM_TDX_VM_INIT.CPUID[]
> because TDX module only allows the bits that are reported as
> configurable to be set to 1.

Yes, that's userspace's responsibility.

> With current designed API, QEMU can only know which bits are
> configurable before KVM_TDX_VM_INIT, i.e., which bits can be set to 1 or
> 0 freely.

The API needs userspace to have full knowledge of the
requirements of the TDX module, if it wants to change the
defaults provided by KVM.

This is the same as for non-TDX VMs (including SNP). The only
difference is that TDX and SNP fails, while non-confidential VMs
get slightly garbage CPUID.

> For other bits not reported as configurable, QEMU can know the exact
> value of them via KVM_TDX_GET_CPUID, after KVM_TDX_VM_INIT and before
> TD's running. With it, QEMU can validate the return value is matched
> with what QEMU wants to set that determined by users input. If not
> matched, QEMU can provide some warnings like what for legacy VMs:
>
> - TDX doesn't support requested feature: CPUID.01H.ECX.tsc-deadline
> [bit 24]
> - TDX forcibly sets features: CPUID.01H:ECX.hypervisor [bit 31]
>
> If there are ioctls to report the fixed0 bits and fixed1 bits for TDX,
> QEMU can validate the user's configuration earlier.

Yes, that's fine.

Paolo