Re: [PATCHv3 0/6] CPPC optional registers AMD support

From: Peter Zijlstra
Date: Sat Jul 13 2019 - 06:46:51 EST


On Wed, Jul 10, 2019 at 06:37:09PM +0000, Natarajan, Janakarajan wrote:
> CPPC (Collaborative Processor Performance Control) offers optional
> registers which can be used to tune the system based on energy and/or
> performance requirements.
>
> Newer AMD processors (>= Family 17h) add support for a subset of these
> optional CPPC registers, based on ACPI v6.1.
>
> The following are the supported CPPC registers for which sysfs entries
> are created:
> * enable (NEW)
> * max_perf (NEW)
> * min_perf (NEW)
> * energy_perf
> * lowest_perf
> * nominal_perf
> * desired_perf (NEW)
> * feedback_ctrs
> * auto_sel_enable (NEW)
> * lowest_nonlinear_perf
>
> First, update cppc_acpi to create sysfs entries only when the optional
> registers are known to be supported.
>
> Next, a new CPUFreq driver is introduced to enable the OSPM and the userspace
> to access the newly supported registers through sysfs entries found in
> /sys/devices/system/cpu/cpu<num>/amd_cpufreq/.
>
> This new CPUFreq driver can only be used by providing a module parameter,
> amd_cpufreq.cppc_enable=1.
>
> The purpose of exposing the registers via the amd-cpufreq sysfs entries is to
> allow the userspace to:
> * Tweak the values to fit its workload.
> * Apply a profile from AMD's optimization guides.

So in general I think it is a huge mistake to expose all that to
userspace. Before you know it, there's tools that actually rely on it,
and then inhibit the kernel from doing anything sane with it.

> Profiles will be documented in the performance/optimization guides.

I don't think userspace can really do anything sane with this; it lacks
much if not all useful information.

> Note:
> * AMD systems will not have a policy applied in the kernel at this time.

And why the heck not? We're trying to move all cpufreq into the
scheduler and have only a single governor, namely schedutil -- yes,
we're still stuck with legacy, and we're still working on performance
parity in some cases, but I really hope to get rid of all other cpufreq
governors eventually.

And if you look at schedutil (schedutil_cpu_util in specific) then
you'll see it is already prepared for CPPC and currently only held back
by the generic cpufreq interface.

It currently only sets desired freq, it has information for
min/guaranteed, and once we get thermal intergrated we might have
sensible data for max freq too.

> TODO:
> * Create a linux userspace tool that will help users generate a CPPC profile
> for their target workload.

Basically a big fat NAK for this approach to cpufreq.

> * Create a general CPPC policy in the kernel.

We already have that, sorta.