Re: [PATCH v3 2/2] cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems

From: Rafael J. Wysocki
Date: Wed Sep 04 2024 - 07:30:50 EST


On Wed, Sep 4, 2024 at 8:33 AM Ricardo Neri
<ricardo.neri-calderon@xxxxxxxxxxxxxxx> wrote:
>
> On Wed, Aug 28, 2024 at 01:48:10PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >
> > Make intel_pstate use the HWP_HIGHEST_PERF values from
> > MSR_HWP_CAPABILITIES to set asymmetric CPU capacity information
> > via the previously introduced arch_set_cpu_capacity() on hybrid
> > systems without SMT.
> >
> > Setting asymmetric CPU capacity is generally necessary to allow the
> > scheduler to compute task sizes in a consistent way across all CPUs
> > in a system where they differ by capacity. That, in turn, should help
> > to improve scheduling decisions. It is also necessary for the schedutil
> > cpufreq governor to operate as expected on hybrid systems where tasks
> > migrate between CPUs of different capacities.
> >
> > The underlying observation is that intel_pstate already uses
> > MSR_HWP_CAPABILITIES to get CPU performance information which is
> > exposed by it via sysfs and CPU performance scaling is based on it.
> > Thus using this information for setting asymmetric CPU capacity is
> > consistent with what the driver has been doing already. Moreover,
> > HWP_HIGHEST_PERF reflects the maximum capacity of a given CPU including
> > both the instructions-per-cycle (IPC) factor and the maximum turbo
> > frequency and the units in which that value is expressed are the same
> > for all CPUs in the system, so the maximum capacity ratio between two
> > CPUs can be obtained by computing the ratio of their HWP_HIGHEST_PERF
> > values. Of course, in principle that capacity ratio need not be
> > directly applicable at lower frequencies, so using it for providing the
> > asymmetric CPU capacity information to the scheduler is a rough
> > approximation, but it is as good as it gets. Also, measurements
> > indicate that this approximation is not too bad in practice.
> >
> > If the given system is hybrid and non-SMT, the new code disables ITMT
> > support in the scheduler (because it may get in the way of asymmetric CPU
> > capacity code in the scheduler that automatically gets enabled by setting
> > asymmetric CPU capacity) after initializing all online CPUs and finds
> > the one with the maximum HWP_HIGHEST_PERF value. Next, it computes the
> > capacity number for each (online) CPU by dividing the product of its
> > HWP_HIGHEST_PERF and SCHED_CAPACITY_SCALE by the maximum HWP_HIGHEST_PERF.
> >
> > When a CPU goes offline, its capacity is reset to SCHED_CAPACITY_SCALE
> > and if it is the one with the maximum HWP_HIGHEST_PERF value, the
> > capacity numbers for all of the other online CPUs are recomputed. This
> > also takes care of a cleanup during driver operation mode changes.
> >
> > Analogously, when a new CPU goes online, its capacity number is updated
> > and if its HWP_HIGHEST_PERF value is greater than the current maximum
> > one, the capacity numbers for all of the other online CPUs are
> > recomputed.
> >
> > The case when the driver is notified of a CPU capacity change, either
> > through the HWP interrupt or through an ACPI notification, is handled
> > similarly to the CPU online case above, except that if the target CPU
> > is the current highest-capacity one and its capacity is reduced, the
> > capacity numbers for all of the other online CPUs need to be recomputed
> > either.
> >
> > If the driver's "no_trubo" sysfs attribute is updated, all of the CPU
> > capacity information is computed from scratch to reflect the new turbo
> > status.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>
> A few minor comments below...
>
> FWIW,
>
> Reviewed-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx>
> Tested-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx> # scale invariance
>
> [...]
>
> > +
> > +static void hybrid_init_cpu_scaling(void)
>
> Maybe renaming hybrid_init_cpu_scaling() as hybrid_init_cpu_capacity_scaling(),
> __hybrid_init_cpu_scaling() as __hybrid_init_cpu_capacity_scaling(), and
> hybrid_update_cpu_scaling() as hybrid_update_cpu_capacity_scaling()?
>
> It would make the code easier to read.

Sure, if that helps.

> > +{
> > + bool disable_itmt = false;
> > +
> > + mutex_lock(&hybrid_capacity_lock);
> > +
> > + /*
> > + * If hybrid_max_perf_cpu is set at this point, the hybrid CPU capacity
> > + * scaling has been enabled already and the driver is just changing the
> > + * operation mode.
> > + */
> > + if (hybrid_max_perf_cpu) {
> > + __hybrid_init_cpu_scaling();
> > + goto unlock;
> > + }
> > +
> > + /*
> > + * On hybrid systems, use asym capacity instead of ITMT, but because
> > + * the capacity of SMT threads is not deterministic even approximately,
> > + * do not do that when SMT is in use.
> > + */
> > + if (hwp_is_hybrid && !sched_smt_active() && arch_enable_hybrid_capacity_scale()) {
> > + __hybrid_init_cpu_scaling();
> > + disable_itmt = true;
> > + }
> > +
> > +unlock:
> > + mutex_unlock(&hybrid_capacity_lock);
> > +
> > + if (disable_itmt)
> > + sched_clear_itmt_support();
>
> It may be worth adding a comment here saying that the sched domains will
> rebuilt to disable asym packing and enable asym capacity.

Won't hurt I suppose.