Re: smpboot: do_boot_cpu failed(-1) to wakeup CPU#0

From: Tvrtko Ursulin
Date: Tue Feb 13 2018 - 09:52:04 EST



Hi,

On 13/02/18 14:39, Thomas Gleixner wrote:
On Tue, 13 Feb 2018, Tvrtko Ursulin wrote:
On 07/02/18 12:48, Tvrtko Ursulin wrote:
We are seeing failures to online the CPU0 on Apollo Lake in the form of:

<6>[ 126.508783] smpboot: CPU 0 is now offline
<6>[ 127.520746] smpboot: Booting Node 0 Processor 0 APIC 0x0
<3>[ 137.521036] smpboot: do_boot_cpu failed(-1) to wakeup CPU#0

I unfortunately cannot say with which kernel version this started since
we added a test which does this only recently. I also have no local
access to this machine. (It is part of a test farm for i915 driver
development testing.) But we recently added a test which off-lines, and
on-lines back, CPUs and started seeing this. Small reproducer looks like
this (without boilerplate):

Any hints on how to debug this? Could it be firwmare? Try some boot options or
something?

There are issues with CPU0 hotplug on commodity hardware. I have systems
where it does not work, but TBH I never bothered to investigate it. Some
years ago we had issues with suspend/resume when it was not running on
CPU0. These were related to firmware assumptions about CPU0. So I wouldn't
be too surprised if there are general issues with unplugging CPU0.

CPU0 unplug is really only relevant for systems which support physical
hotplug, so testing it on commodity hardware does not have much
value. Testing in VMs for increasing the test coverage works well enough.

Thanks, that explains it.

We actually use CPU hotplug just to test if the PMU event migration and accounting works as expected in i915 PMU. And since, luckily, the issue with CPU0 hotplug manifests only on one of the test systems, I think we will just skip this test on that machine.

Thanks again!

Tvrtko