Re: Linux 5.3-rc7

From: Thomas Gleixner
Date: Sat Sep 07 2019 - 11:00:23 EST


On Sat, 7 Sep 2019, Chris Wilson wrote:
> Quoting Thomas Gleixner (2019-09-07 15:29:19)
> > On Sat, 7 Sep 2019, Chris Wilson wrote:
> > > Quoting Linus Torvalds (2019-09-02 18:28:26)
> > > > Bandan Das:
> > > > x86/apic: Include the LDR when clearing out APIC registers
> > >
> > > Apologies if this is known already, I'm way behind on email.
> > >
> > > I've bisected
> > >
> > > [ 18.693846] smpboot: CPU 0 is now offline
> > > [ 19.707737] smpboot: Booting Node 0 Processor 0 APIC 0x0
> > > [ 29.707602] smpboot: do_boot_cpu failed(-1) to wakeup CPU#0
> > >
> > > https://intel-gfx-ci.01.org/tree/drm-tip/igt@perf_pmu@cpu-hotplug.html
> > >
> > > to 558682b52919. (Reverts cleanly and fixes the problem.)
> > >
> > > I'm guessing that this is also behind the suspend failures, missing
> > > /dev/cpu/0/msr, and random perf_event_open() failures we have observed
> > > in our CI since -rc7 across all generations of Intel cpus.
> >
> > So is this on bare metal or in a VM?
>
> Our single virtualised piece of kit doesn't support cpu hotplug, so this
> test is not being run. We have failures on
> icl (2019), glk (2017), kbl (2017), bxt (2016), skl (2015),
> bsw (2016), hsw (2013), byt (2013), snb (2011), elk (2008),
> bwr (2006), blb (2007)

Ok let me find a testbox to figure out whats wrong there.

Does this only happen with that CPU0 hotplug stuff enabled or on CPUs other
than CPU0 as well? That hotplug CPU0 stuff is a bandaid so I wouldn't be
surprised if we broke that somehow.

Thanks,

tglx