Re: [PATCH v7 4/7] ACPI: CPPC: add APIs and sysfs interface for min/max_perf
From: Rafael J. Wysocki
Date: Thu Feb 05 2026 - 14:28:10 EST
On Thu, Feb 5, 2026 at 8:21 PM Sumit Gupta <sumitg@xxxxxxxxxx> wrote:
>
> >>>>>>>>>>> Hi Sumit,
> >>>>>>>>>>>
> >>>>>>>>>>> I am thinking that maybe it is better to call these two sysfs
> >>>>>>>>>>> interface
> >>>>>>>>>>> 'min_freq' and 'max_freq' as users read and write khz instead
> >>>>>>>>>>> of raw
> >>>>>>>>>>> value.
> >>>>>>>>>> Thanks for the suggestion.
> >>>>>>>>>> Kept min_perf/max_perf to match the CPPC register names
> >>>>>>>>>> (MIN_PERF/MAX_PERF), making it clear to users familiar with
> >>>>>>>>>> CPPC what's being controlled.
> >>>>>>>>>> The kHz unit is documented in the ABI.
> >>>>>>>>>>
> >>>>>>>>>> Thank you,
> >>>>>>>>>> Sumit Gupta
> >>>>>>>>> On my x86 machine with kernel 6.18.5, the kernel is exposing raw
> >>>>>>>>> values:
> >>>>>>>>>
> >>>>>>>>>> grep . /sys/devices/system/cpu/cpu0/acpi_cppc/*
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/feedback_ctrs:ref:342904018856568
> >>>>>>>>>
> >>>>>>>>> del:437439724183386
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/guaranteed_perf:63
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf:88
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq:0
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf:36
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf:1
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq:3900
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf:62
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf:62
> >>>>>>>>> /sys/devices/system/cpu/cpu0/acpi_cppc/wraparound_time:18446744073709551615
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> It would be surprising for a nearby sysfs interface with very
> >>>>>>>>> similar
> >>>>>>>>> names to use kHz instead.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> Russell Haley
> >>>>>>>> I can rename to either of the below:
> >>>>>>>> - min/max_freq: might be confused with scaling_min/max_freq.
> >>>>>>>> - min/max_perf_freq: keeps the CPPC register association clear.
> >>>>>>>>
> >>>>>>>> Rafael, Any preferences here?
> >>>>>>> On x86 the units in CPPC are not kHz and there is no easy reliable
> >>>>>>> way
> >>>>>>> to convert them to kHz.
> >>>>>>>
> >>>>>>> Everything under /sys/devices/system/cpu/cpu0/acpi_cppc/ needs to be
> >>>>>>> in CPPC units, not kHz (unless, of course, kHz are CPPC units).
> >>>>>
> >>>>> In v1 [1], these controls were added under acpi_cppc sysfs.
> >>>>> After discussion, they were moved under cpufreq, and [2] was merged
> >>>>> first.
> >>>>> The decision to use frequency scale instead of raw perf was made
> >>>>> for consistency with other cpufreq interfaces as per (v3 [3]).
> >>>>>
> >>>>> CPPC units in our case are also not in kHz. The kHz conversion uses the
> >>>>> existing cppc_perf_to_khz()/cppc_khz_to_perf() helpers which are
> >>>>> already
> >>>>> used in cppc_cpufreq attributes. So the conversion behavior is
> >>>>> consistent
> >>>>> with existing cpufreq interfaces.
> >>>>>
> >>>>> [1]
> >>>>> https://lore.kernel.org/lkml/076c199c-a081-4a7f-956c-f395f4d5e156@xxxxxxxxxx/
> >>>>>
> >>>>> [2]
> >>>>> https://lore.kernel.org/all/20250507031941.2812701-1-zhenglifeng1@xxxxxxxxxx/
> >>>>>
> >>>>> [3]
> >>>>> https://lore.kernel.org/lkml/80e16de0-63e4-4ead-9577-4ebba9b1a02d@xxxxxxxxxx/
> >>>>>
> >>>>>
> >>>>>> That said, the new attributes will show up elsewhere.
> >>>>>>
> >>>>>> So why do you need to add these things in the first place?
> >>>>> Currently there's no sysfs interface to dynamically control the
> >>>>> MIN_PERF/MAX_PERF bounds when using autonomous mode. This helps
> >>>>> users tune power and performance at runtime.
> >>>> So what about scaling_min_freq and scaling_max_freq?
> >>>>
> >>>> intel_pstate uses them for an analogous purpose.
> >>> FWIW same thing for amd_pstate.
> >>>
> >> intel_pstate and amd_pstate seem to use setpolicy() to update
> >> scaling_min/max_freq and program MIN_PERF/MAX_PERF.
> > That's one possibility.
> >
> > intel_pstate has a "cpufreq-compatible" mode (in which case it is
> > called intel_cpufreq) and still uses HWP (which is the underlying
> > mechanism for CPPC on Intel platforms).
> >
> >> However, as discussed in v5 [1], cppc_cpufreq cannot switch to
> >> a setpolicy based approach because:
> >> - We need per-CPU control of auto_sel: With setpolicy, we can't
> >> dynamically disable auto_sel for individual CPUs and return to the
> >> target() (no target hook available).
> >> intel_pstate and amd_pstate seem to set HW autonomous mode for
> >> all CPUs, not per-CPU.
> >> - We need to retain the target() callback - the CPPC spec allows
> >> desired_perf to be used even when autonomous selection is enabled.
> > intel_pstate in the "cpufreq-compatible" mode updates its HWP min and
> > max limits when .target() (or .fast_switch() or .adjust_perf()) is
> > called.
> >
> > I guess that would not be sufficient in cppc_cpufreq for some reason?
> >
> >> [1]
> >> https://lore.kernel.org/lkml/66f58f43-631b-40a0-8d42-4e90cd24b757@xxxxxxx/
>
> We can do the same as intel_cpufreq. CPPC spec allows setting
> MIN_PERF/MAX_PERF even when auto_selection is disabled, so we will
> have to update them always from policy limits in target().
>
> However, this would override BIOS-configured MIN_PERF/MAX_PERF values.
> Since policy->min/max are set from hardware capabilities during init,
> any governor would overwrite BIOS bounds with policy limits (hardware
> capability bounds) on their first frequency request - even when user
> hasn't explicitly changed scaling_min/max_freq.
>
> Does intel_cpufreq also override BIOS-configured HWP min/max values?
Yes, it does.
> Should we preserve BIOS-configured values until user explicitly changes
> scaling_min/max_freq?
Why would that be useful?
> Is there any mechanism in cpufreq core to detect explicit user changes to scaling_min/max_freq?
Not today, but since scaling_min/max_freq have their own freq QoS
requests, it should be doable if need be.
In any case, I would very much prefer using the existing
scaling_min/max_freq interface, even if that would require some
additional plumbing, to adding new sysfs attributes pretty much for
the same purpose that would only be used by one driver.