Re: [PATCH v2 6/7] KVM: VMX: Check Intel PT related CPUID leaves

From: Sean Christopherson
Date: Thu Sep 09 2021 - 17:42:08 EST


On Fri, Aug 27, 2021, Xiaoyao Li wrote:
> CPUID 0xD leaves reports the capabilities of Intel PT, e.g. it decides
> which bits are valid to be set in MSR_IA32_RTIT_CTL, and reports the
> number of PT ADDR ranges.
>
> KVM needs to check that guest CPUID values set by userspace doesn't
> enable any bit which is not supported by bare metal. Otherwise,
> 1. it will trigger vm-entry failure if hardware unsupported bit is
> exposed to guest and set by guest.
> 2. it triggers #GP when context switch PT MSRs if exposing more
> RTIT_ADDR* MSRs than hardware capacity.
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@xxxxxxxxx>
> ---
> There is bit 31 of CPUID(0xD, 0).ECX that doesn't restrict any bit in
> MSR_IA32_RTIT_CTL. If guest has different value than host, it won't
> cause any vm-entry failure, but guest will parse the PT packet with
> wrong format.
>
> I also check it to be same as host to ensure the virtualization correctness.
>
> Changes in v2:
> - Call out that if configuring more PT ADDR MSRs than hardware, it can
> cause #GP when context switch.
> ---
> arch/x86/kvm/cpuid.c | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 739be5da3bca..0c8e06a24156 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -76,6 +76,7 @@ static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
> static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent)
> {
> struct kvm_cpuid_entry2 *best;
> + u32 eax, ebx, ecx, edx;
>
> /*
> * The existing code assumes virtual address is 48-bit or 57-bit in the
> @@ -89,6 +90,30 @@ static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent)
> return -EINVAL;
> }
>
> + /*
> + * CPUID 0xD leaves tell Intel PT capabilities, which decides

CPUID.0xD is XSAVE state, CPUID.0x14 is Intel PT. This series needs tests...

> + * pt_desc.ctl_bitmask in later update_intel_pt_cfg().
> + *
> + * pt_desc.ctl_bitmask decides the legal value for guest
> + * MSR_IA32_RTIT_CTL. KVM cannot support PT capabilities beyond native,
> + * otherwise it will trigger vm-entry failure if guest sets native
> + * unsupported bits in MSR_IA32_RTIT_CTL.
> + */
> + best = cpuid_entry2_find(entries, nent, 0xD, 0);
> + if (best) {
> + cpuid_count(0xD, 0, &eax, &ebx, &ecx, &edx);
> + if (best->ebx & ~ebx || best->ecx & ~ecx)
> + return -EINVAL;
> + }
> + best = cpuid_entry2_find(entries, nent, 0xD, 1);
> + if (best) {
> + cpuid_count(0xD, 0, &eax, &ebx, &ecx, &edx);
> + if (((best->eax & 0x7) > (eax & 0x7)) ||

Ugh, looking at the rest of the code, even this isn't sufficient because
pt_desc.guest.addr_{a,b} are hardcoded at 4 entries, i.e. running KVM on hardware
with >4 entries will lead to buffer overflows.

One option would be to bump that to the theoretical max of 15, which doesn't seem
too horrible, especially if pt_desc as a whole is allocated on-demand, which it
probably should be since it isn't exactly tiny (nor ubiquitous)

A different option would be to let userspace define whatever it wants for guest
CPUID, and instead cap nr_addr_ranges at min(host.cpuid, guest.cpuid, RTIT_ADDR_RANGE).

Letting userspace generate a bad MSR_IA32_RTIT_CTL is not problematic, there are
plenty of ways userspace can deliberately trigger VM-Entry failure due to invalid
guest state (even if this is a VM-Fail condition, it's not a danger to KVM).

> + ((best->eax & ~eax) >> 16) ||
> + (best->ebx & ~ebx))
> + return -EINVAL;
> + }
> +
> return 0;
> }
>
> --
> 2.27.0
>