Re: [PATCH v2 5/8] KVM: x86: Don't bother reading sparse banks that end up being ignored

From: Vitaly Kuznetsov
Date: Mon Nov 01 2021 - 05:51:48 EST


Sean Christopherson <seanjc@xxxxxxxxxx> writes:

> When handling "sparse" VP_SET requests, don't read sparse banks that
> can't possibly contain a legal VP index instead of ignoring such banks
> later on in sparse_set_to_vcpu_mask(). This allows KVM to cap the size
> of its sparse_banks arrays for VP_SET at KVM_HV_MAX_SPARSE_VCPU_SET_BITS.
>
> Reducing the size of sparse_banks fudges around a compilation warning
> (that becomes error with KVM_WERROR=y) when CONFIG_KASAN_STACK=y, which
> is selected (and can't be unselected) by CONFIG_KASAN=y when using gcc
> (clang/LLVM is a stack hog in some cases so it's opt-in for clang).
> KASAN_STACK adds a redzone around every stack variable, which pushes the
> Hyper-V functions over the default limit of 1024.
>
> Ideally, KVM would flat out reject such impossibilities, but the TLFS
> explicitly allows providing empty banks, even if a bank can't possibly
> contain a valid VP index due to its position exceeding KVM's max.
>
> Furthermore, for a bit 1 in ValidBankMask, it is valid state for the
> corresponding element in BanksContents can be all 0s, meaning no
> processors are specified in this bank.
>
> Arguably KVM should reject and not ignore the "extra" banks, but that can
> be done independently and without bloating sparse_banks, e.g. by reading
> each "extra" 8-byte chunk individually.
>
> Reported-by: Ajay Garg <ajaygargnsit@xxxxxxxxx>
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> ---
> arch/x86/kvm/hyperv.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 3d0981163eed..8832727d74d9 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1753,11 +1753,16 @@ struct kvm_hv_hcall {
> static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
> u64 *sparse_banks, gpa_t offset)
> {
> + u16 var_cnt;
> +
> if (hc->var_cnt > 64)
> return -EINVAL;
>
> + /* Ignore banks that cannot possibly contain a legal VP index. */
> + var_cnt = min_t(u16, hc->var_cnt, KVM_HV_MAX_SPARSE_VCPU_SET_BITS);
> +

One may wonder why we're mixing up VP indices and VCPU ids (caped by
KVM_MAX_VCPUS) here as these don't have to match. The following commit
sheds some light:

commit 9170200ec0ebad70e5b9902bc93e2b1b11456a3b
Author: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
Date: Wed Aug 22 12:18:28 2018 +0200

KVM: x86: hyperv: enforce vp_index < KVM_MAX_VCPUS

Hyper-V TLFS (5.0b) states:

> Virtual processors are identified by using an index (VP index). The
> maximum number of virtual processors per partition supported by the
> current implementation of the hypervisor can be obtained through CPUID
> leaf 0x40000005. A virtual processor index must be less than the
> maximum number of virtual processors per partition.

Forbid userspace to set VP_INDEX above KVM_MAX_VCPUS. get_vcpu_by_vpidx()
can now be optimized to bail early when supplied vpidx is >= KVM_MAX_VCPUS.

> return kvm_read_guest(kvm, hc->ingpa + offset, sparse_banks,
> - hc->var_cnt * sizeof(*sparse_banks));
> + var_cnt * sizeof(*sparse_banks));
> }
>
> static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool ex)
> @@ -1770,7 +1775,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool
> DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS);
> unsigned long *vcpu_mask;
> u64 valid_bank_mask;
> - u64 sparse_banks[64];
> + u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
> bool all_cpus;
>
> if (!ex) {
> @@ -1894,7 +1899,7 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool
> DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS);
> unsigned long *vcpu_mask;
> unsigned long valid_bank_mask;
> - u64 sparse_banks[64];
> + u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
> u32 vector;
> bool all_cpus;

Saves the day until KVM_MAX_VCPUS goes above 4096 (and when it does the
problem strikes back even worse as KVM_HV_MAX_SPARSE_VCPU_SET_BITS is
not caped at '64'). As we're good for now,

Reviewed-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>

(I'd even suggest we add BUILD_BUG_ON(KVM_HV_MAX_SPARSE_VCPU_SET_BITS > 64))

Going forward, we can probably get rid of thes on-stack allocations
completely by either allocating these 512 bytes dynamically (lazily)
upon first usage or just adding a field to 'struct kvm_vcpu_hv' -- which
is being allcated dynamically nowadays so non-Windows guests won't suffer.

--
Vitaly