Re: [PATCH] KVM: x86: Shove vp_bitmap handling down into sparse_set_to_vcpu_mask()
From: Sean Christopherson
Date: Fri Oct 29 2021 - 15:42:26 EST
On Fri, Oct 29, 2021, Sean Christopherson wrote:
> On Fri, Oct 29, 2021, Sean Christopherson wrote:
> > On Fri, Oct 29, 2021, Sean Christopherson wrote:
> > > On Fri, Oct 29, 2021, Vitaly Kuznetsov wrote:
> > > > > + /* If vp_index == vcpu_idx for all vCPUs, fill vcpu_mask directly. */
> > > > > + if (likely(!has_mismatch))
> > > > > + bitmap = (u64 *)vcpu_mask;
> > > > > +
> > > > > + memset(bitmap, 0, sizeof(vp_bitmap));
> > > >
> > > > ... but in the unlikely case has_mismatch == true 'bitmap' is still
> > > > uninitialized here, right? How doesn't it crash?
> > >
> > > I'm sure it does crash. I'll hack the guest to actually test this.
> >
> > Crash confirmed. But I don't feel too bad about my one-line goof because the
> > existing code botches sparse VP_SET, i.e. _EX flows. The spec requires the guest
> > to explicit specify the number of QWORDS in the variable header[*], e.g. VP_SET
> > in this case, but KVM ignores that and does a harebrained calculation to "count"
> > the number of sparse banks. It does this by counting the number of bits set in
> > valid_bank_mask, which is comically broken because (a) the whole "sparse" thing
> > should be a clue that they banks are not packed together, (b) the spec clearly
> > states that "bank = VPindex / 64", (c) the sparse_bank madness makes this waaaay
> > more complicated than it needs to be, and (d) the massive sparse_bank allocation
> > on the stack is completely unnecessary because KVM simply ignores everything that
> > wouldn't fit in vp_bitmap.
> >
> > To reproduce, stuff vp_index in descending order starting from KVM_MAX_VCPUS - 1.
> >
> > hv_vcpu->vp_index = KVM_MAX_VCPUS - vcpu->vcpu_idx - 1;
> >
> > E.g. with an 8 vCPU guest, KVM will calculate sparse_banks_len=1, read zeros, and
> > do nothing, hanging the guest because it never sends IPIs.
>
> Ugh, I can't read. The example[*] clarifies that the "sparse" VP_SET packs things
> into BankContents. I don't think I imagined my guest hanging though, so something
> is awry. Back to debugging...
Found the culprit. When __send_ipi_mask_ex() (in the guest) sees that the target
set is all present CPUs, it skips setting the sparse VP_SET and goes straight to
HV_GENERIC_SET_ALL, but still issues the _EX versions. KVM mishandles that case
by skipping the IPIs altogether when there's no sparse banks. The spec says that
it's legal for there to be no sparse banks if the data is not needed, which is
the case here since the format is not sparse.