Re: [PATCH v2 7/8] sched/schedutil: Add a new tunable to dictate response time
From: Rafael J. Wysocki
Date: Mon Dec 11 2023 - 15:21:31 EST
On Sun, Dec 10, 2023 at 9:40 PM Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
>
> On 12/08/23 19:06, Rafael J. Wysocki wrote:
> > On Fri, Dec 8, 2023 at 1:24 AM Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> > >
> > > The new tunable, response_time_ms, allow us to speed up or slow down
> > > the response time of the policy to meet the perf, power and thermal
> > > characteristic desired by the user/sysadmin. There's no single universal
> > > trade-off that we can apply for all systems even if they use the same
> > > SoC. The form factor of the system, the dominant use case, and in case
> > > of battery powered systems, the size of the battery and presence or
> > > absence of active cooling can play a big role on what would be best to
> > > use.
> > >
> > > The new tunable provides sensible defaults, but yet gives the power to
> > > control the response time to the user/sysadmin, if they wish to.
> > >
> > > This tunable is applied before we apply the DVFS headroom.
> > >
> > > The default behavior of applying 1.25 headroom can be re-instated easily
> > > now. But we continue to keep the min required headroom to overcome
> > > hardware limitation in its speed to change DVFS. And any additional
> > > headroom to speed things up must be applied by userspace to match their
> > > expectation for best perf/watt as it dictates a type of policy that will
> > > be better for some systems, but worse for others.
> > >
> > > There's a whitespace clean up included in sugov_start().
> > >
> > > Signed-off-by: Qais Yousef (Google) <qyousef@xxxxxxxxxxx>
> >
> > I thought that there was an agreement to avoid adding any new tunables
> > to schedutil.
>
> Oh. I didn't know that.
>
> What alternatives do we have? I couldn't see how can we universally make the
> response work for every possible system (not just SoC, but different platforms
> with same SoC even) and workloads. We see big power saving with no or little
> perf impact on many workloads when not applying the current 125%. Others want
> to push it faster under gaming scenarios etc to get more stable FPS.
>
> Hopefully uclamp will make the need for this tuning obsolete over time. But
> until userspace gains critical mass; I can't see how we can know best
> trade-offs for all myriads of use cases/systems.
>
> Some are happy to gain more perf and lose power. Others prefer to save power
> over perf. DVFS response time plays a critical role in this trade-off and I'm
> not sure how we can crystal ball it without delegating.
I understand the motivation, but counter-arguments are based on the
experience with the cpufreq governors predating schedutil, especially
ondemand. Namely, at one point people focused on adjusting all of the
governor tunables to their needs without contributing any code or even
insights back, so when schedutil was introduced, a decision was made
to reduce the tunability to a minimum (preferably no tunables at all,
but it turned out to be hard to avoid the one tunable existing today).
Peter was involved in those discussions and I think that the point
made then is still valid.
The headroom formula was based on the observation that it would be a
good idea to have some headroom in the majority of cases and on the
balance between the simplicity of computation and general suitability.
Of course, it is hard to devise a single value that will work for
everyone, but tunables complicate things from the maintenance
perspective. For example, the more tunables there are, the harder it
is to make changes without altering the behavior in ways that will
break someone's setup.