Re: 4.10-rc1: thinkpad x60: who ate my cpu?

From: Woody Suwalski
Date: Sun Feb 12 2017 - 10:44:20 EST


Woody Suwalski wrote:
Pavel Machek wrote:
On Sat 2017-01-14 12:30:54, Pavel Machek wrote:
Hi!

On Thu 2017-01-12 20:19:31, Woody Suwalski wrote:
Pavel Machek wrote:
Hi!

I used to have two cpus, and Thinkpad X60 should have two cores, but I
only see one on 4.10-rc1. This machine went through many
suspend/resume cycles. When backups finish, I'll try -rc2.
Whoever did it, he seems to have returned the cpu in -rc3. All seems
to be good now.
Actually since you have mentioned - I have checked my x60 - same problem -
only one CPU. However I was running 4.8.13 with uptime 33 days, multiple
sleep/wake-ups.
Installed a current EOL 4.8.17 and rebooted - I see 2 CPUs. So the issue is
older then 4.10 kernel, and I suspect it is the CPU hotplug / wakeup
related...
Hmm. So I seen two cores in -rc3 after boot. But it is quite well
possible that -rc1 was ok just after boot, too, and problem happened
sometime later (probably during suspend/resume cycles). Let me go back
to -rc1 to check.
Indeed in -rc1 I see both CPUs after boot. So we have hard to
reproduce case where 4.8 to 4.10 kernels lose one of the cpu cores...



Managed to duplicate - but it took again a long time - I have an uptime of 29 days.
It must have happened in the last day, as I kept checking as often as I remembered.

The kernel is 4.8.17 EOL, installed almost a month ago.
Platform ThinkPad x60, Intel(R) Core(TM) Duo CPU T2400 @ 1.83GHz

In dmesg I see that it used to be when 2 CPUs were OK:
[690409.476107] PM: noirq suspend of devices complete after 79.914 msecs
[690409.476547] ACPI: Preparing to enter system sleep state S3
[690409.780081] ACPI : EC: EC stopped
[690409.780083] PM: Saving platform NVS memory
[690409.780284] Disabling non-boot CPUs ...
[690409.805284] smpboot: CPU 1 is now offline
[690409.816464] ACPI: Low-level resume complete
[690409.816464] ACPI : EC: EC started
[690409.816464] PM: Restoring platform NVS memory
[690409.816464] Enabling non-boot CPUs ...
[690409.840574] x86: Booting SMP configuration:
[690409.840576] smpboot: Booting Node 0 Processor 1 APIC 0x1
[690409.805271] Initializing CPU#1
[690409.805271] Disabled fast string operations
[690409.888252] cache: parent cpu1 should not be sleeping
[690409.920185] CPU1 is up
[690409.922288] ACPI: Waking up from system sleep state S3

Then the CPU1 failed to start:

[691329.776108] PM: noirq suspend of devices complete after 79.941 msecs
[691329.776550] ACPI: Preparing to enter system sleep state S3
[691330.080081] ACPI : EC: EC stopped
[691330.080083] PM: Saving platform NVS memory
[691330.080284] Disabling non-boot CPUs ...
[691330.105303] smpboot: CPU 1 is now offline
[691330.116477] ACPI: Low-level resume complete
[691330.116477] ACPI : EC: EC started
[691330.116477] PM: Restoring platform NVS memory
[691330.116477] Enabling non-boot CPUs ...
[691330.140570] x86: Booting SMP configuration:
[691330.140572] smpboot: Booting Node 0 Processor 1 APIC 0x1
[691340.140015] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1
[691340.164445] Error taking CPU1 up: -5
[691340.166309] ACPI: Waking up from system sleep state S3

And now it is:
[692517.868523] ACPI: Preparing to enter system sleep state S3
[692518.172074] ACPI : EC: EC stopped
[692518.172076] PM: Saving platform NVS memory
[692518.172269] Disabling non-boot CPUs ...
[692518.172269] ACPI: Low-level resume complete
[692518.172269] ACPI : EC: EC started
[692518.172269] PM: Restoring platform NVS memory
[692518.172269] ACPI: Waking up from system sleep state S3

Is there any test I could do on the CPU wakeup while in that state?

Woody

Is there a way to kick the offline-CPU into operation from /sys level?