[RFC PATCH v1] cpu/suspend: Do a partial hotplug during suspend
From: Saravana Kannan
Date: Mon Nov 18 2024 - 21:05:34 EST
The hotplug state machine goes through 100+ states when transitioning
from online to offline. And on the way back, it goes through these
states in reverse.
When a CPU goes offline, some of the states that occur after a CPU is
powered off are about freeing up various per-CPU resources like
kmalloc caches, pages, network buffers, etc. All of these states make
sense when a CPU is permanently hotplugged off.
However, when offlining a CPU during suspend, we just want to power
down the CPUs to that the system can enter suspend. In this scenario,
we could simply stop the hotplug state machine right after the CPU has
been power off. During resume, we can simply resume the CPU to an
online state from the state where we paused the offline.
This save both time both during suspend and resume and it is
proportional to the number of CPUs in the system. So, if systems with
a large number of CPUs, we can expect this to have a huge amount of
time saved.
On a Pixel 6, averaging across 100+ suspend/resumes cycles, the total
time to power off 7 of the 8 CPUs goes from 51 ms down to 24 ms.
Similarly, the average time to power off each individual CPU (they are
different) also goes down by 50%.
The average time spent powering up CPUs goes down from 34 ms to 32 ms.
Keep in mind that the time saved during resume is not easily
quantified by looking at CPU onlining times. This is because the
actual time savings comes later when per-CPU resources do not need to
be reallocated and would speed up actions like allocations, etc that
can pick up memory from per-CPU kmalloc caches, etc.
Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx>
---
Hi Thomas/Peter,
The hotplug state machine rewrite is great! Enables all kinds of
optimizations for suspend/resume.
About this patch, I'm not sure if the exact state the hotplug state is
paused at (CPUHP_WORKQUEUE_PREP) will work for all arch/boards, but
this is the general idea.
If it works as is, great! At a glance, it looks like it should work
though. None of the other stages between this and CPUHP_OFFLINE seem
to be touching hardware.
If CPUHP_WORKQUEUE_PREP doesn't work, then we can make it a config
option to select the state or an arch call or something along those
lines.
What are your thoughts on this? How would you like me to proceed?
Btw, ideally, I'd have liked to have paused at CPUHP_TMIGR_PREPARE,
but the hrtimers unwinding seems to be broken if we fail/unwind before
reaching "hrtimers:prepare". I'll send out a separate email on that.
Thanks,
Saravana
kernel/cpu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index d293d52a3e00..ca21ac017471 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1649,7 +1649,7 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
if (st->state >= target)
goto out;
- if (st->state == CPUHP_OFFLINE) {
+ if (st->state < CPUHP_BP_KICK_AP) {
/* Let it fail before we try to bring the cpu up */
idle = idle_thread_get(cpu);
if (IS_ERR(idle)) {
@@ -1931,7 +1931,7 @@ int freeze_secondary_cpus(int primary)
}
trace_suspend_resume(TPS("CPU_OFF"), cpu, true);
- error = _cpu_down(cpu, 1, CPUHP_OFFLINE);
+ error = _cpu_down(cpu, 1, CPUHP_WORKQUEUE_PREP);
trace_suspend_resume(TPS("CPU_OFF"), cpu, false);
if (!error)
cpumask_set_cpu(cpu, frozen_cpus);
--
2.47.0.338.g60cca15819-goog