Re: [BUG] Guest OSes die simultaneously (bisected)

From: Paolo Bonzini
Date: Thu Jan 04 2024 - 10:00:56 EST


On Thu, Jan 4, 2024 at 3:58 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> On Wed, Jan 03, 2024 at 05:00:35PM -0800, Paul E. McKenney wrote:
> > On Wed, Jan 03, 2024 at 04:24:06PM -0800, Sean Christopherson wrote:
> > > On Wed, Jan 03, 2024, Paul E. McKenney wrote:
> > > > On Wed, Jan 03, 2024 at 02:22:23PM -0800, Paul E. McKenney wrote:
> > > > > Hello!
> > > > >
> > > > > Since some time between v5.19 and v6.4, long-running rcutorture tests
> > > > > would (rarely but intolerably often) have all guests on a given host die
> > > > > simultaneously with something like an instruction fault or a segmentation
> > > > > violation.
> > > > >
> > > > > Each bisection step required 20 hosts running 10 hours each, and
> > > > > this eventually fingered commit c59a1f106f5c ("KVM: x86/pmu: Add
> > > > > IA32_PEBS_ENABLE MSR emulation for extended PEBS"). Although this commit
> > > > > is certainly messing with things that could possibly cause all manner
> > > > > of mischief, I don't immediately see a smoking gun. Except that the
> > > > > commit prior to this one is rock solid.
> > > > > Just to make things a bit more exciting, bisection in mainline proved
> > > > > to be problematic due to bugs of various kinds that hid this one. I was
> > > > > therefore forced to bisect among the commits backported to the internal
> > > > > v5.19-based kernel, which fingered the backported version of the patch
> > > > > called out above.
> > > >
> > > > Ah, and so why do I believe that this is a problem in mainline rather
> > > > than just (say) a backporting mistake?
> > > >
> > > > Because this issue was first located in v6.4, which already has this
> > > > commit included.
> > > >
> > > > Thanx, Paul
> > > >
> > > > > Please note that this is not (yet) an emergency. I will just continue
> > > > > to run rcutorture on v5.19-based hypervisors in the meantime.
> > > > >
> > > > > Any suggestions for debugging or fixing?
> > >
> > > This looks suspect:
> > >
> > > + u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable;
> > > + int global_ctrl, pebs_enable;
> > >
> > > - arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
> > > - arr[0].host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
> > > - arr[0].guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask;
> > > - arr[0].guest &= ~(cpuc->pebs_enabled & x86_pmu.pebs_capable);
> > > - *nr = 1;
> > > + *nr = 0;
> > > + global_ctrl = (*nr)++;
> > > + arr[global_ctrl] = (struct perf_guest_switch_msr){
> > > + .msr = MSR_CORE_PERF_GLOBAL_CTRL,
> > > + .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
> > > + .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
> > > + };
> > >
> > >
> > > IIUC (always a big if with this code), the intent is that the guest's version of
> > > PERF_GLOBAL_CTRL gets bits that are (a) not exclusive to the host and (b) not
> > > being used for PEBS. (b) is necessary because PEBS generates records in memory
> > > using virtual addresses, i.e. the CPU will write to memory using a virtual address
> > > that is valid for the host but not the guest. And so PMU counters that are
> > > configured to generate PEBS records need to be disabled while running the guest.
> > >
> > > Before that commit, the logic was:
> > >
> > > guest[PERF_GLOBAL_CTRL] = ctrl & ~host;
> > > guest[PERF_GLOBAL_CTRL] &= ~pebs;
> > >
> > > But after, it's now:
> > >
> > > guest[PERF_GLOBAL_CTRL] = ctrl & (~host | ~pebs);
> > >
> > > I.e. the kernel is enabled counters in the guest that are not host-only OR not
> > > PEBS. E.g. if only counter 0 is in use, it's using PEBS, but it's not exclusive
> > > to the host, then the new code will yield (truncated to a single byte for sanity)
> > >
> > > 1 = 1 & (0xf | 0xe)
> > >
> > > and thus keep counter 0 enabled, whereas the old code would yield
> > >
> > > 1 = 1 & 0xf
> > > 0 = 1 & 0xe
> > >
> > > A bit of a shot in the dark and completed untested, but I think this is the correct
> > > fix?
> >
> > I am firing off some tests, and either way, thank you very much!!!
>
> Woo-hoo!!! ;-)
>
> Tested-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
>
> Will you be sending a proper patch, or would you prefer that I do so?
> In the latter case, I would need your Signed-off-by.

I will fast track this one to Linus.

Paolo

> And again, thank you very much!!!
>
> Thanx, Paul
>
> > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > > index a08f794a0e79..92d5a3464cb2 100644
> > > --- a/arch/x86/events/intel/core.c
> > > +++ b/arch/x86/events/intel/core.c
> > > @@ -4056,7 +4056,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
> > > arr[global_ctrl] = (struct perf_guest_switch_msr){
> > > .msr = MSR_CORE_PERF_GLOBAL_CTRL,
> > > .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
> > > - .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
> > > + .guest = intel_ctrl & ~(cpuc->intel_ctrl_host_mask | pebs_mask),
> > > };
> > >
> > > if (!x86_pmu.pebs)
> > >
>