Re: [PATCH] Revert "ACPI: CPPC: Fix remaining for_each_possible_cpu() to use online CPUs"
From: Sean Kelley
Date: Fri Apr 17 2026 - 03:22:52 EST
On 4/16/26 8:46 PM, Jinjie Ruan wrote:
On 4/17/2026 11:00 AM, Sean Kelley wrote:
On 4/16/26 7:00 PM, Jinjie Ruan wrote:
On 4/17/2026 3:36 AM, Sean Kelley wrote:
On 4/16/26 1:52 AM, Jinjie Ruan wrote:
This reverts commit 56eb0c0ed345da7815274aa821a8546a073d7e97, because
this commit cause warning call trace below when concurrently
bringing up
and down two SMT threads of a physical core.
The issue timeline is as follows:
1. when the system starts,
cpufreq: cpu: 220, policy->related_cpus: 220-221, policy->cpus:
220-221
2. Offline cpu 220 and cpu 221.
3. Online cpu 220
- cpu 221 is now offline, as acpi_get_psd_map() use
for_each_online_cpu(),
so the cpu_data->shared_cpu_map, policy->cpus, and related_cpus
has only
cpu 220.
cpufreq: cpu: 220, related_cpus: 220, cpus: 220
4. offline cpu 220
5. online cpu 221, the below call trace occurs:
- Because cpu 220 and cpu 221 share one policy, and policy-
related_cpus= 220 after step 3, so cpu 221 is not in policy->related_cpus
but per_cpu(cpufreq_cpu_data, cpu221) is not NULL.
The _PSD (P-State Dependency) defines the hardware-level dependency of
frequency control across CPU cores. Since this relationship is a
physical
attribute of the hardware topology, it remains constant regardless
of the
online or offline status of the CPUs.
Using for_each_online_cpu() in acpi_get_psd_map() is problematic. If a
CPU is offline, it will be excluded from the shared_cpu_map.
Consequently, if that CPU is brought online later, the kernel will
fail to
recognize it as part of any shared frequency domain.
Switch back to for_each_possible_cpu() to ensure that all cores defined
in the ACPI tables are correctly mapped into their respective
performance
domains from the start. This aligns with the logic of policy-
related_cpus,which must encompass all potentially available cores in the domain to
prevent logic gaps during CPU hotplug operations.
Yep, agree that using for_each_online_cpu() in acpi_get_psd_map()
drops valid domain members and breaks the hotplug case you described.
But a plain revert also re-exposes the nosmt bug. On systems where a
possible CPU is never probed, per_cpu(cpc_desc_ptr, i) is NULL and
acpi_get_psd_map() currently hits goto err_fault instead of just
skipping that CPU.
I have a question regarding the original issue where it states 'This
breaks systems booted with "nosmt" or "nosmt=force"'. As far as I know,
the 'nosmt' parameter is only supported on x86. However, it seems the
current x86 kernel does not support enabling CONFIG_ACPI_CPPC_CPUFREQ
(only supported on arm64/arm/riscv). Could you please share the details
of your testing environment?
Yeah, good question. However, nosmt is actually defined generically in
kernel/cpu.c
(early_param("nosmt", smt_cmdline_disable)), not in any
arch-specific code, so it works on anything that selects HOTPLUG_SMT.
arm64 does:
arch/arm64/Kconfig select HOTPLUG_SMT if HOTPLUG_CPU
And acpi_get_psd_map() is called from drivers/cpufreq/cppc_cpufreq.c,
which has: depends on ARM || ARM64 || RISCV.
So the overlap where this bug actually triggers is arm64. Testing was
on an NVIDIA Vera (Olympus) platform booted with "nosmt". That's the
environment where the original -EFAULT failure in acpi_get_psd_map()
was reproduced.
After reverting commit 56eb0c0ed345 ("ACPI: CPPC: Fix remaining
for_each_possible_cpu() to use online CPUs"), I tried to reproduce your
original issue on my local ARM64 machine with the 'nosmt' boot parameter
enabled and I have confirmed that it works., but I couldn't trigger it
and the acpi cppc cpufreq driver successfully registered. Could you
please help test if the new fix resolves the problem you encountered
previously?
Best regards,
Jinjie
Hi Jinjie,
Applied v2 on current master and built for arm64 (NVIDIA Vera, has SMT).
Tested both scenarios:
1. nosmt boot -- cppc_cpufreq registers cleanly, no -EFAULT, no
warnings in dmesg.
2. Concurrent hotplug of SMT siblings (your original reproducer
adapted to valid CPU indices on my system). Ran the dual
offline/online loop for 30s. No WARN, no cpufreq_online call
trace.
Both pass.
Thanks!
Sean
So I think the fix is to restore for_each_possible_cpu() for the PSD
map, but change the NULL case to continue:
I agree.
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -530,7 +530,7 @@
match_cpc_ptr = per_cpu(cpc_desc_ptr, i);
if (!match_cpc_ptr)
- goto err_fault;
+ continue;
match_pdomain = &(match_cpc_ptr->domain_info);
That way offline CPUs with valid descriptors remain in shared_cpu_map
(fixing the hotplug trace), while never-probed CPUs are skipped
instead of failing map construction.
The send_pcc_cmd() hunk already does if (!desc) continue, so reverting
that loop back to for_each_possible_cpu() looks fine as-is.
Happy to send the continue fix as a patch on top, or please feel free
to fold it into yours if that makes sense.
Thanks! I will fold your fix into my patch and add your Co-developed-by
tag in the next version.
Sounds good, thanks.
Sean
Sean
How to reproduce, on arm64 machine with SMT support which use acpi cppc
cpufreq driver:
bash test.sh 220 & bash test.sh 221 &
The test.sh is as below:
while true
do
echo 0 > /sys/devices/system/cpu/cpu${1}/online
sleep 0.5
cat /sys/devices/system/cpu/cpu${1}/cpufreq/related_cpus
echo 1 > /sys/devices/system/cpu/cpu${1}/online
cat /sys/devices/system/cpu/cpu${1}/cpufreq/related_cpus
done
CPU: 221 PID: 1119 Comm: cpuhp/221 Kdump: loaded Not tainted
6.6.0debug+ #5
Hardware name: To be filled by O.E.M. S920X20/BC83AMDA01-7270Z,
BIOS 20.39 09/04/2024
pstate: a1400009 (NzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : cpufreq_online+0x8ac/0xa90
lr : cpuhp_cpufreq_online+0x18/0x30
sp : ffff80008739bce0
x29: ffff80008739bce0 x28: 0000000000000000 x27: ffff28400ca32200
x26: 0000000000000000 x25: 0000000000000003 x24: ffffd483503ff000
x23: ffffd483504051a0 x22: ffffd48350024a00 x21: 00000000000000dd
x20: 000000000000001d x19: ffff28400ca32000 x18: 0000000000000000
x17: 0000000000000020 x16: ffffd4834e6a3fc8 x15: 0000000000000020
x14: 0000000000000008 x13: 0000000000000001 x12: 00000000ffffffff
x11: 0000000000000040 x10: ffffd48350430728 x9 : ffffd4834f087c78
x8 : 0000000000000001 x7 : ffff2840092bdf00 x6 : ffffd483504264f0
x5 : ffffd48350405000 x4 : ffff283f7f95cc60 x3 : 0000000000000000
x2 : ffff53bc2f94b000 x1 : 00000000000000dd x0 : 0000000000000000
Call trace:
cpufreq_online+0x8ac/0xa90
cpuhp_cpufreq_online+0x18/0x30
cpuhp_invoke_callback+0x128/0x580
cpuhp_thread_fun+0x110/0x1b0
smpboot_thread_fn+0x140/0x190
kthread+0xec/0x100
ret_from_fork+0x10/0x20
---[ end trace 0000000000000000 ]---
Cc: stable@xxxxxxxxxxxxxxx
Fixes: 56eb0c0ed345 ("ACPI: CPPC: Fix remaining
for_each_possible_cpu() to use online CPUs")
Signed-off-by: Jinjie Ruan <ruanjinjie@xxxxxxxxxx>
---
drivers/acpi/cppc_acpi.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index f0e513e9ed5d..9ae29f2c6db8 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -362,7 +362,7 @@ static int send_pcc_cmd(int pcc_ss_id, u16 cmd)
end:
if (cmd == CMD_WRITE) {
if (unlikely(ret)) {
- for_each_online_cpu(i) {
+ for_each_possible_cpu(i) {
struct cpc_desc *desc = per_cpu(cpc_desc_ptr, i);
if (!desc)
@@ -524,7 +524,7 @@ int acpi_get_psd_map(unsigned int cpu, struct
cppc_cpudata *cpu_data)
else if (pdomain->coord_type == DOMAIN_COORD_TYPE_SW_ANY)
cpu_data->shared_type = CPUFREQ_SHARED_TYPE_ANY;
- for_each_online_cpu(i) {
+ for_each_possible_cpu(i) {
if (i == cpu)
continue;