Re: [PATCH 4/8] cpufreq/schedutil: sysfs capacity margin tunable

From: Michael Turquette
Date: Thu Mar 17 2016 - 14:56:18 EST


Quoting Juri Lelli (2016-03-17 10:54:07)
> Hi,
>
> On 17/03/16 15:53, Patrick Bellasi wrote:
> > On 17-Mar 06:55, Steve Muckle wrote:
> > > On 03/17/2016 02:40 AM, Juri Lelli wrote:
> > > >> Could the default schedtune value not serve as the out of the box margin?
> > > >>
> > > > I'm not sure I understand you here. For me schedtune should be disabled
> > > > by default, so I'd say that it doesn't introduce any additional margin
> > > > by default. But we still need a margin to make the governor work without
> > > > schedtune in the mix.
> > >
> > > Why not have schedtune be enabled always, and use it to add the margin?
> > > It seems like it'd simplify things.
> >
> > Actually one of the effects we noticed when SchedTune and SchedFreq
> > are both in use is that we have a sort of "double boosting" effect.
> >
> > SchedTune boosts the CPU utilization signal, thus already providing a
> > sort of margin for the selection of the OPP. This margin overlaps with
> > the SchedFreq margin, which in turns could results in the selection of
> > an OPP even more higher than required (with boost already accouned).
> >
> > > I haven't looked at the schedtune code at all so I don't know whether
> > > this makes sense given its current implementation.
> >
> > The current implementation requires review, of course ;-)
> > Last (and only) posting is based on top of SchedFreq code, as it was
> > at that time.
> >
> > > But conceptually I don't know why we'd need or want one margin in
> > > schedutil which will be tunable, and then another mechanism for
> > > tuning as well.
> >
> > I agree with Steve on the conceptual standpoint. The main goal of
> > SchedTune is actually to provide a "single tunable" to bias many
> > different subsystem in a "consistent" way. Thus, from a conceptual
> > standpoint, IMO it makes sens to investigate better how the boost value
> > can be linked with SchedFreq.
> >
> > A possible option can be to:
> > 1. use an hardcoded margin (M) defined by SchedFreq
> > this margin is used to trigger OPP jumps
> > when SchedTune _is not_ in use
> > 2. "compose" the M margin with a boost value defined margin (B)
> > when SchedTune _is_ in use
> >
> > This means, e.g.
> > schedfreq_margin = max(M, B)
> > Thus:
> > a) non boosted tasks (and in general when SchedTune is not in use)
> > gets OPPs jumps based on the hardcoded M margin
> > b) boosted tasks can get more aggressive OPPs jumps based on the B
> > margin
> >
> > While the M margin is hardcoded, the B one is defined via CGroups
> > depending on the how much tasks needs to be boosted.
> >
>
> Makes sense to me. And I think M margin is the one we don't want to make
> part of the ABI and only play with it under DEBUG.

Correct.

Regarding "composing" the margin, schedtune could even overwrite the
margin entirely via cpufreq_set_cfs_capacity_margin (see patch #2 in
this series). This avoids complications around a "double boosting"
effect.

Either way, it sounds like the schedtune angle is something that we can
figure out in due time and change the code as needed later on. For
schedutil to make sense for frequency-invariant platforms we do need a
margin today, and there is desire to tune it easily, so I will move this
sysfs knob to a debug knob in v2.

Regards,
Mike

>
> Best,
>
> - Juri