Re: [PATCH v2 2/5] cpufreq: intel_pstate: Always return last EPP value from sysfs

From: Rafael J. Wysocki
Date: Tue Aug 25 2020 - 10:52:03 EST


On Tue, Aug 25, 2020 at 8:20 AM Artem Bityutskiy <dedekind1@xxxxxxxxx> wrote:
>
> On Mon, 2020-08-24 at 19:42 +0200, Rafael J. Wysocki wrote:
> > From: "Rafael J. Wysocki" <rafael.j.wysocki@xxxxxxxxx>
> >
> > Make the energy_performance_preference policy attribute in sysfs
> > always return the last EPP value written to it instead of the one
> > currently in the HWP Request MSR to avoid possible confusion when
> > the performance scaling algorithm is used in the active mode with
> > HWP enabled (in which case the EPP is forced to 0 regardless of
> > what value it has been set to via sysfs).
>
> Why is this a good idea, I wonder. If there was a prior discussion,
> please, point to it.
>
> The general approach to changing settings via sysfs is often like this:
>
> 1. Write new value.
> 2. Read it back and verify that it is the same. Because there is no
> better way to verify that the kernel "accepted" the value.

If the write is successful (ie. no errors returned and the value
returned is equal to the number of written characters), the kernel
*has* accepted the written value, but it may not have taken effect.
These are two different things.

The written value may take an effect immediately or it may take an
effect later, depending on the current configuration etc. If you
don't see the effect of it immediately, it doesn't matter that there
was a failure of some sort.

> Let's say I write 'balanced' to energy_performance_preference. I read
> it back, and it contains 'balanced', so I am happy, I trust the kernel
> changed EPP to "balanced".
>
> If the kernel, in fact, uses something else, I want to know about it
> and have my script fail.

Why do you want it to fail then?

> Why caching the value and making my script _think_ it succeeded is a good idea.

Because when you change the scaling algorithm or the driver's
operation mode, the value you have written will take effect.

In this particular case it is explained in the driver documentation
that the performance scaling algorithm in the active mode overrides
the sysfs value and that's the only case when it can be overridden.
So whatever you write to this attribute will not take effect
immediately anyway, but it may take an effect later.

> In other words, in my usage scenarios at list, I prefer kernel telling
> the true EPP value, not some "cached, but not used" value.

An alternative is to fail writes to energy_performance_preference if
the driver works in the active mode and the scaling algorithm for the
scaling CPU is performance and *then* to make reads from it return the
value in the register.

Accepting a write and returning a different value in a subsequent read
is confusing.

Thanks!