Re: [PATCH] cpufreq: CPPC: add autonomous mode boot parameter support
From: Sumit Gupta
Date: Mon Apr 20 2026 - 09:07:56 EST
FWIU the auto_sel_mode module parameter allows toOn 3/17/26 16:10, Sumit Gupta wrote:In an earlier version [1], the setup was in cppc_cpufreq_init() but
Add kernel boot parameter 'cppc_cpufreq.auto_sel_mode' to enable CPPCIf the goal is to set autosel only once at the driver init,
autonomous performance selection on all CPUs at system startup without
requiring runtime sysfs manipulation. When autonomous mode is enabled,
the hardware automatically adjusts CPU performance based on workload
demands using Energy Performance Preference (EPP) hints.
When auto_sel_mode=1:
- Configure all CPUs for autonomous operation on first init
- Set EPP to performance preference (0x0)
- Use HW min/max when set; otherwise program from policy limits (caps)
- Clamp desired_perf to bounds before enabling autonomous mode
- Hardware controls frequency instead of the OS governor
The boot parameter is applied only during first policy initialization.
On hotplug, skip applying it so that the user's runtime sysfs
configuration is preserved.
Reviewed-by: Randy Dunlap <rdunlap@xxxxxxxxxxxxx> (Documentation)
Signed-off-by: Sumit Gupta <sumitg@xxxxxxxxxx>
---
Part 1 [1] of this series was applied for 7.1 and present in next.
Sending this patch as reworked version of 'patch 11' from [2] based
on next.
[1]
https://lore.kernel.org/lkml/20260206142658.72583-1-sumitg@xxxxxxxxxx/
[2]
https://lore.kernel.org/lkml/20251223121307.711773-1-sumitg@xxxxxxxxxx/
---
.../admin-guide/kernel-parameters.txt | 13 +++
drivers/cpufreq/cppc_cpufreq.c | 84
+++++++++++++++++--
2 files changed, 92 insertions(+), 5 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt
b/Documentation/admin-guide/kernel-parameters.txt
index fa6171b5fdd5..de4b4c89edfe 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1060,6 +1060,19 @@ Kernel parameters
policy to use. This governor must be
registered in the
kernel before the cpufreq driver probes.
+ cppc_cpufreq.auto_sel_mode=
+ [CPU_FREQ] Enable ACPI CPPC autonomous
performance
+ selection. When enabled, hardware
automatically adjusts
+ CPU frequency on all CPUs based on workload
demands.
+ In Autonomous mode, Energy Performance
Preference (EPP)
+ hints guide hardware toward performance (0x0)
or energy
+ efficiency (0xff).
+ Requires ACPI CPPC autonomous selection
register support.
+ Format: <bool>
+ Default: 0 (disabled)
+ 0: use cpufreq governors
+ 1: enable if supported by hardware
+
cpu_init_udelay=N
[X86,EARLY] Delay for N microsec between
assert and de-assert
of APIC INIT to start processors. This delay
occurs
diff --git a/drivers/cpufreq/cppc_cpufreq.c
b/drivers/cpufreq/cppc_cpufreq.c
index 5dfb109cf1f4..49c148b2a0a4 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -28,6 +28,9 @@
static struct cpufreq_driver cppc_cpufreq_driver;
+/* Autonomous Selection boot parameter */
+static bool auto_sel_mode;
+
#ifdef CONFIG_ACPI_CPPC_CPUFREQ_FIE
static enum {
FIE_UNSET = -1,
@@ -708,11 +711,74 @@ static int cppc_cpufreq_cpu_init(struct
cpufreq_policy *policy)
policy->cur = cppc_perf_to_khz(caps, caps->highest_perf);
cpu_data->perf_ctrls.desired_perf = caps->highest_perf;
- ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
- if (ret) {
- pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n",
- caps->highest_perf, cpu, ret);
- goto out;
+ /*
+ * Enable autonomous mode on first init if boot param is set.
+ * Check last_governor to detect first init and skip if auto_sel
+ * is already enabled.
+ */
shouldn't this be done in cppc_cpufreq_init() ?
I understand that cpu_data doesn't exist yet in
cppc_cpufreq_init(), but this seems more appropriate to do
it there IMO.
This means the cpudata should be updated accordingly
in this cppc_cpufreq_cpu_init() function.
was moved to cppc_cpufreq_cpu_init() to improve per-CPU error handling.
Keeping the setup in cppc_cpufreq_init() helps to avoid the last_governor
check. We can warn for a CPU failing to enable and continue so other
CPUs keep autonomous mode.
cppc_cpufreq_cpu_init() would then just check the auto_sel state
from register and sync policy limits from min/max_perf registers when
autonomous mode is active.
Please let me know your thoughts.
configure the default auto_sel_mode when the driver is
first loaded, so there should not need to check that again
whenever cppc_cpufreq_cpu_init() is called.
Maybe Ionela saw something we didn't see ?
AFAIU, the concern in that review [1] was about error handling as the
earlier version disabled auto_sel on all CPUs if any single CPU failed.
Per-CPU error handling in cppc_cpufreq_init() (warn and continue)
addresses that. Can't think of more reason.
Do you have anything in mind?
Also just to be sure, should it still be possible to change
the auto_sel_mode through the sysfs if the driver was
loaded with auto_sel_mode=1 ?
Yes, the per-CPU auto_select sysfs attribute works independently of the
boot param. Users can enable or disable auto_sel on any CPU at runtime
via sysfs, regardless of how the driver was loaded. The boot param only
sets the initial state.
[1]Ah ok right thanks.
https://lore.kernel.org/lkml/5593d364-ca37-41c5-b33f-f7e245d6d626@xxxxxxxxxx/
Agree that the CPPC Enable is about using the CPPC control path+ if (auto_sel_mode && policy->last_governor[0] == '\0' &&The documentation of the CPPC Enable Register is subject to
+ !cpu_data->perf_ctrls.auto_sel) {
+ /* Enable CPPC - optional register, some platforms
need it */
interpretation, but IIUC the field should be set to use the CPPC
controls, so I assume this should be set in cppc_cpufreq_init()
instead ?
in general and not only for autonomous selection.
Will move cppc_set_enable() into cppc_cpufreq_init() or outside the
autonomous mode block in cppc_cpufreq_cpu_init() as per conclusion
of previous comment.
It is newly added in [2].+ ret = cppc_set_enable(cpu, true);The function doesn't seem to exist.
+ if (ret && ret != -EOPNOTSUPP)
+ pr_warn("Failed to enable CPPC for CPU%d
(%d)\n", cpu, ret);
+
+ /*
+ * Prefer HW min/max_perf when set; otherwise program
from
+ * policy limits derived earlier from caps.
+ * Clamp desired_perf to bounds and sync policy->cur.
+ */
+ if (!cpu_data->perf_ctrls.min_perf ||
!cpu_data->perf_ctrls.max_perf)
Don't need to call it if we move the setup to cppc_cpufreq_init().
[2]I need to ping Viresh to check if this is still relevant.
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=ea3db45ae476889a1ba0ab3617e6afdeeefbda3d
Yes.+ cppc_cpufreq_update_perf_limits(cpu_data, policy);Maybe this should also be done in cppc_cpufreq_init()
+
+ cpu_data->perf_ctrls.desired_perf =
+ clamp_t(u32, cpu_data->perf_ctrls.desired_perf,
+ cpu_data->perf_ctrls.min_perf,
+ cpu_data->perf_ctrls.max_perf);
+
+ policy->cur = cppc_perf_to_khz(caps,
+ cpu_data->perf_ctrls.desired_perf);
+
if the auto_sel_mode parameter is set ?
+ /* EPP is optional - some platforms may not support it */There is a patchset ongoing which tries to remove
+ ret = cppc_set_epp(cpu, CPPC_EPP_PERFORMANCE_PREF);
+ if (ret && ret != -EOPNOTSUPP)
+ pr_warn("Failed to set EPP for CPU%d (%d)\n",
cpu, ret);
+ else if (!ret)
+ cpu_data->perf_ctrls.energy_perf =
CPPC_EPP_PERFORMANCE_PREF;
+
+ ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
+ if (ret) {
+ pr_debug("Err setting perf for autonomous mode
CPU:%d ret:%d\n",
+ cpu, ret);
+ goto out;
+ }
+
+ ret = cppc_set_auto_sel(cpu, true);
+ if (ret && ret != -EOPNOTSUPP) {
+ pr_warn("Failed autonomous config for CPU%d
(%d)\n",
+ cpu, ret);
+ goto out;
+ }
+ if (!ret)
+ cpu_data->perf_ctrls.auto_sel = true;
+ }
+
+ if (cpu_data->perf_ctrls.auto_sel) {
setting policy->min/max from driver initialization.
Indeed, these values are only temporarily valid,
until the governor override them.
It is not sure yet the patch will be accepted though.
https://lore.kernel.org/lkml/20260317101753.2284763-4-pierre.gondois@xxxxxxx/
You are right that policy->min/max from .init() are temporary today
as cpufreq_set_policy() overwrites them before the governor starts.
On my test platform (highest == nominal, lowest_nonlinear == lowest),
this had no visible effect because the BIOS bounds and cpuinfo range
end up identical. But on platforms where they differ, the governor
would widen the range to full cpuinfo limits.
I think your patch [3] fixes this by giving these the right semantic as
initial QoS requests. With it, cpufreq_set_policy() preserves the policy
limits set from min/max_perf registers in .init(), which can either be
BIOS values on first boot or last user configured values before hotplug.
I will update the comment in v2 to reflect QoS seeding intent.
I see that the first two patches of your series [3] is applied for 7.1.
Do you plan to send the pending patch (3/4) from [3]?
[3]Yes right, so loading/unloading the driver might change the
https://lore.kernel.org/lkml/20260317101753.2284763-4-pierre.gondois@xxxxxxx/
The auto_sel_mode boot path programs EPP to performance preference(0),
+ /* Sync policy limits from HW when autonomous mode isIf the firmware has a default EPP value, it means that loading
active */
+ policy->min = cppc_perf_to_khz(caps,
+ cpu_data->perf_ctrls.min_perf ?:
+ caps->lowest_nonlinear_perf);
+ policy->max = cppc_perf_to_khz(caps,
+ cpu_data->perf_ctrls.max_perf ?:
+ caps->nominal_perf);
+ } else {
+ /* Normal mode: governors control frequency */
+ ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls);
+ if (ret) {
+ pr_debug("Err setting perf value:%d on CPU:%d.
ret:%d\n",
+ caps->highest_perf, cpu, ret);
+ goto out;
+ }
}
cppc_cpufreq_cpu_fie_init(policy);
@@ -1038,10 +1104,18 @@ static int __init cppc_cpufreq_init(void)
static void __exit cppc_cpufreq_exit(void)
{
+ unsigned int cpu;
+
+ for_each_present_cpu(cpu)
+ cppc_set_auto_sel(cpu, false);
and the unloading the driver will reset this default EPP value.
Maybe the initial EPP value and/or the auto_sel value should be
cached somewhere and restored on exit ?
I don't know if this is actually an issue, this is just to signal it.
not the firmware’s previous value. On unload we only call
cppc_set_auto_sel(false); we do not restore EPP, min/max perf,
or other CPPC fields to firmware defaults.
default EPP value.
Acknowledged.
With auto_sel_mode, load/unload can leave EPP and other CPPC fields
different from the firmware defaults at boot.
We can add explicit cache-and-restore on exit in a follow-up
if that is desired.
Thank you,
Sumit Gupta