Re: [PATCH v2 2/5] cpufreq: intel_pstate: Always return last EPP value from sysfs
From: Rafael J. Wysocki
Date: Tue Aug 25 2020 - 11:53:19 EST
On Tue, Aug 25, 2020 at 5:27 PM Srinivas Pandruvada
<srinivas.pandruvada@xxxxxxxxxxxxxxx> wrote:
>
> On Tue, 2020-08-25 at 17:14 +0200, Rafael J. Wysocki wrote:
> > On Tue, Aug 25, 2020 at 5:06 PM Srinivas Pandruvada
> > <srinivas.pandruvada@xxxxxxxxxxxxxxx> wrote:
> > > On Tue, 2020-08-25 at 16:51 +0200, Rafael J. Wysocki wrote:
> > > > On Tue, Aug 25, 2020 at 8:20 AM Artem Bityutskiy <
> > > > dedekind1@xxxxxxxxx
> > > > > wrote:
> > > > > On Mon, 2020-08-24 at 19:42 +0200, Rafael J. Wysocki wrote:
> > > > > > From: "Rafael J. Wysocki" <rafael.j.wysocki@xxxxxxxxx>
> > > > > >
> > > > > > Make the energy_performance_preference policy attribute in
> > > > > > sysfs
> > > > > > always return the last EPP value written to it instead of the
> > > > > > one
> > > > > > currently in the HWP Request MSR to avoid possible confusion
> > > > > > when
> > > > > > the performance scaling algorithm is used in the active mode
> > > > > > with
> > > > > > HWP enabled (in which case the EPP is forced to 0 regardless
> > > > > > of
> > > > > > what value it has been set to via sysfs).
> > > > >
> > > > > Why is this a good idea, I wonder. If there was a prior
> > > > > discussion,
> > > > > please, point to it.
> > > > >
> > > > > The general approach to changing settings via sysfs is often
> > > > > like
> > > > > this:
> > > > >
> > > > > 1. Write new value.
> > > > > 2. Read it back and verify that it is the same. Because there
> > > > > is no
> > > > > better way to verify that the kernel "accepted" the value.
> > > >
> > > > If the write is successful (ie. no errors returned and the value
> > > > returned is equal to the number of written characters), the
> > > > kernel
> > > > *has* accepted the written value, but it may not have taken
> > > > effect.
> > > > These are two different things.
> > > >
> > > > The written value may take an effect immediately or it may take
> > > > an
> > > > effect later, depending on the current configuration etc. If you
> > > > don't see the effect of it immediately, it doesn't matter that
> > > > there
> > > > was a failure of some sort.
> > > >
> > > > > Let's say I write 'balanced' to energy_performance_preference.
> > > > > I
> > > > > read
> > > > > it back, and it contains 'balanced', so I am happy, I trust the
> > > > > kernel
> > > > > changed EPP to "balanced".
> > > > >
> > > > > If the kernel, in fact, uses something else, I want to know
> > > > > about
> > > > > it
> > > > > and have my script fail.
> > > >
> > > > Why do you want it to fail then?
> > > >
> > > > > Why caching the value and making my script _think_ it succeeded
> > > > > is
> > > > > a good idea.
> > > >
> > > > Because when you change the scaling algorithm or the driver's
> > > > operation mode, the value you have written will take effect.
> > > >
> > > > In this particular case it is explained in the driver
> > > > documentation
> > > > that the performance scaling algorithm in the active mode
> > > > overrides
> > > > the sysfs value and that's the only case when it can be
> > > > overridden.
> > > > So whatever you write to this attribute will not take effect
> > > > immediately anyway, but it may take an effect later.
> > >
> > > In some cases without even changing active/passive this is
> > > happening
> > > when there was some error previously. For example:
> > >
> > > #cat energy_performance_preference
> > > 127
> > > [root@otcpl-perf-test-skx-i9 cpufreq]# rdmsr -p 1 0x774
> > > 8000ff00
> > >
> > > I think we should show reality. In mode change can be a special
> > > case
> > > and use the stored value to restore in new mode.
> >
> > OK, so I'll make it fail on attempts to change the EPP from 0
> > (performance) in the active mode with the performance "governor".
> >
> Here the scaling governor is powersave.
>
> # cat scaling_governor
> powersave
What I'm saying is that reads from energy_performance_preference will
still return the register value, but writes to it will fail on
attempts to change to anything different from "performance" when in
the active mode and the current governor is "performance".
Cheers!