Re: smpboot: do_boot_cpu failed(-1) to wakeup CPU#0

From: Thomas Gleixner
Date: Tue Feb 13 2018 - 09:39:04 EST


On Tue, 13 Feb 2018, Tvrtko Ursulin wrote:
> On 07/02/18 12:48, Tvrtko Ursulin wrote:
> > We are seeing failures to online the CPU0 on Apollo Lake in the form of:
> >
> > <6>[ 126.508783] smpboot: CPU 0 is now offline
> > <6>[ 127.520746] smpboot: Booting Node 0 Processor 0 APIC 0x0
> > <3>[ 137.521036] smpboot: do_boot_cpu failed(-1) to wakeup CPU#0
> >
> > I unfortunately cannot say with which kernel version this started since
> > we added a test which does this only recently. I also have no local
> > access to this machine. (It is part of a test farm for i915 driver
> > development testing.) But we recently added a test which off-lines, and
> > on-lines back, CPUs and started seeing this. Small reproducer looks like
> > this (without boilerplate):
>
> Any hints on how to debug this? Could it be firwmare? Try some boot options or
> something?

There are issues with CPU0 hotplug on commodity hardware. I have systems
where it does not work, but TBH I never bothered to investigate it. Some
years ago we had issues with suspend/resume when it was not running on
CPU0. These were related to firmware assumptions about CPU0. So I wouldn't
be too surprised if there are general issues with unplugging CPU0.

CPU0 unplug is really only relevant for systems which support physical
hotplug, so testing it on commodity hardware does not have much
value. Testing in VMs for increasing the test coverage works well enough.

Thanks,

tglx