RE: [PATCHv3 0/6] CPPC optional registers AMD support

From: Ghannam, Yazen
Date: Mon Jul 15 2019 - 13:57:20 EST


> -----Original Message-----
> From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Sent: Saturday, July 13, 2019 5:46 AM
> To: Natarajan, Janakarajan <Janakarajan.Natarajan@xxxxxxx>
> Cc: linux-acpi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-pm@xxxxxxxxxxxxxxx; devel@xxxxxxxxxx; Rafael J . Wysocki
> <rjw@xxxxxxxxxxxxx>; Len Brown <lenb@xxxxxxxxxx>; Viresh Kumar <viresh.kumar@xxxxxxxxxx>; Robert Moore
> <robert.moore@xxxxxxxxx>; Erik Schmauss <erik.schmauss@xxxxxxxxx>; Ghannam, Yazen <Yazen.Ghannam@xxxxxxx>
> Subject: Re: [PATCHv3 0/6] CPPC optional registers AMD support
>
> On Wed, Jul 10, 2019 at 06:37:09PM +0000, Natarajan, Janakarajan wrote:
> > CPPC (Collaborative Processor Performance Control) offers optional
> > registers which can be used to tune the system based on energy and/or
> > performance requirements.
> >
> > Newer AMD processors (>= Family 17h) add support for a subset of these
> > optional CPPC registers, based on ACPI v6.1.
> >
> > The following are the supported CPPC registers for which sysfs entries
> > are created:
> > * enable (NEW)
> > * max_perf (NEW)
> > * min_perf (NEW)
> > * energy_perf
> > * lowest_perf
> > * nominal_perf
> > * desired_perf (NEW)
> > * feedback_ctrs
> > * auto_sel_enable (NEW)
> > * lowest_nonlinear_perf
> >
> > First, update cppc_acpi to create sysfs entries only when the optional
> > registers are known to be supported.
> >
> > Next, a new CPUFreq driver is introduced to enable the OSPM and the userspace
> > to access the newly supported registers through sysfs entries found in
> > /sys/devices/system/cpu/cpu<num>/amd_cpufreq/.
> >
> > This new CPUFreq driver can only be used by providing a module parameter,
> > amd_cpufreq.cppc_enable=1.
> >
> > The purpose of exposing the registers via the amd-cpufreq sysfs entries is to
> > allow the userspace to:
> > * Tweak the values to fit its workload.
> > * Apply a profile from AMD's optimization guides.
>
> So in general I think it is a huge mistake to expose all that to
> userspace. Before you know it, there's tools that actually rely on it,
> and then inhibit the kernel from doing anything sane with it.
>

Okay, makes sense.

Is there any way to expose a sysfs interface and make it explicitly "experimental"? Maybe putting it in Documentation/ABI/testing/?

Or do you think it's just not worth it?

> > Profiles will be documented in the performance/optimization guides.
>
> I don't think userspace can really do anything sane with this; it lacks
> much if not all useful information.
>
> > Note:
> > * AMD systems will not have a policy applied in the kernel at this time.
>
> And why the heck not? We're trying to move all cpufreq into the
> scheduler and have only a single governor, namely schedutil -- yes,
> we're still stuck with legacy, and we're still working on performance
> parity in some cases, but I really hope to get rid of all other cpufreq
> governors eventually.
>

Because this is new to AMD systems, we didn't want to enforce a default policy.

We figured that exposing the CPPC interface would be a good way to decouple policy from the kernel and let users experiment/tune their systems, like using the userspace governor. And if some pattern emerged then we could make that a default policy in the kernel (for AMD or in general).

But you're saying we should focus more on working with the schedutil governor, correct? Do you think there's still a use for a userspace governor?

> And if you look at schedutil (schedutil_cpu_util in specific) then
> you'll see it is already prepared for CPPC and currently only held back
> by the generic cpufreq interface.
>
> It currently only sets desired freq, it has information for
> min/guaranteed, and once we get thermal intergrated we might have
> sensible data for max freq too.
>

Will do.

> > TODO:
> > * Create a linux userspace tool that will help users generate a CPPC profile
> > for their target workload.
>
> Basically a big fat NAK for this approach to cpufreq.
>

Is that for exposing the sysfs interface, having a stub driver, or both?

Would it be better to have a cpufreq driver that implements some policy rather than just providing the sysfs interface?

> > * Create a general CPPC policy in the kernel.
>
> We already have that, sorta.

Right, but it seems to still be focused on CPU frequency rather than abstract performance like how CPPC is defined.

This is another reason for exposing the CPPC interface directly. We'll give users the ability to interact with the platform, using CPPC, without having to follow the CPUFREQ paradigm.

Do you think this is doable? Or should we always have some kernel interaction because of the scheduler, etc.?

Thanks,
Yazen