Re: [PATCH v2 1/1] x86,sched: On AMD EPYC set freq_max = max_boost in schedutil invariant formula

From: Rafael J. Wysocki
Date: Tue Feb 02 2021 - 13:44:47 EST


On Mon, Jan 25, 2021 at 11:11 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Fri, Jan 22, 2021 at 09:40:38PM +0100, Giovanni Gherdovich wrote:
> > This workload is constant in time, so instead of using the PELT sum we can
> > pretend that scale invariance is obtained with
> >
> > util_inv = util_raw * freq_curr / freq_max1 [formula-1]
> >
> > where util_raw is the PELT util from v5.10 (which is to say, not invariant),
> > and util_inv is the PELT util from v5.11-rc4. freq_max1 comes from
> > commit 976df7e5730e ("x86, sched: Use midpoint of max_boost and max_P for
> > frequency invariance on AMD EPYC") and is (P0+max_boost)/2 = (2.25+3.4)/2 =
> > 2.825 GHz. Then we have the schedutil formula
> >
> > freq_next = 1.25 * freq_max2 * util_inv [formula-2]
> >
> > Here v5.11-rc4 uses freq_max2 = P0 = 2.25 GHz (and this patch changes it to
> > 3.4 GHz).
> >
> > Since all cores are busy, there is no boost available. Let's be generous and say
> > the tasks initially get P0, i.e. freq_curr = 2.25 GHz. Combining the formulas
> > above and taking util_raw = 825/1024 = 0.8, freq_next is:
> >
> > freq_next = 1.25 * 2.25 * 0.8 * 2.25 / 2.825 = 1.79 GHz
>
> Right, so here's a 'problem' between schedutil and cpufreq, they don't
> use the same f_max at all times.
>
> And this is also an inconsistency between acpi_cpufreq and intel_pstate
> (passive). IIRC the intel_pstate cpufreq drivers uses 4C/1C/P0 resp,
> while ACPI seems to stick to P0 f_max.

The only place where 4C is used is the scale invariance code AFAICS.

intel_pstate uses P0 as the f_max unless turbo is disabled.

The difference between intel_pstate and acpi_cpufreq is that (a) the
latter uses a frequency table and the former doesn't and (b) the
latter uses the P0 entry of the frequency table to represent the
entire turbo range,

> Rafael; should ACPI change that behaviour rather than adding yet another
> magic variable?

I'm not sure. That may change the behavior from what is expected by some users.