Re: [PATCH v5 5/6] sched/fair: Allow load balancing between CPUs of identical capacity
From: Vincent Guittot
Date: Tue Jun 30 2026 - 03:01:22 EST
On Mon, 29 Jun 2026 at 18:23, Ricardo Neri
<ricardo.neri-calderon@xxxxxxxxxxxxxxx> wrote:
>
> On Mon, Jun 29, 2026 at 05:58:56PM +0200, Vincent Guittot wrote:
> > On Mon, 29 Jun 2026 at 17:54, Rafael J. Wysocki (Intel)
> > <rafael@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Jun 29, 2026 at 5:35 PM Vincent Guittot
> > > <vincent.guittot@xxxxxxxxxx> wrote:
> > > >
> > > > On Sat, 27 Jun 2026 at 03:53, Ricardo Neri
> > > > <ricardo.neri-calderon@xxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Fri, Jun 26, 2026 at 04:50:12PM +0200, Vincent Guittot wrote:
> > > > > > On Fri, 26 Jun 2026 at 02:02, Ricardo Neri
> > > > > > <ricardo.neri-calderon@xxxxxxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Tue, Jun 23, 2026 at 10:25:14PM -0700, Ricardo Neri wrote:
> > > > > > > > On Tue, Jun 23, 2026 at 08:45:23AM +0100, Christian Loehle wrote:
> > > > > > > > > On 6/23/26 08:20, Vincent Guittot wrote:
> > > > > > > > > > On Tue, 23 Jun 2026 at 01:55, Ricardo Neri
> > > > > > > > > > <ricardo.neri-calderon@xxxxxxxxxxxxxxx> wrote:
> > > > > > > > > >>
> > > > > > > > > >> sched_balance_find_src_rq() avoids selecting a runqueue with a single
> > > > > > > > > >> running task as busiest if doing so results in migrating the task to a
> > > > > > > > > >> CPU with less than ~5% of extra capacity. It also unintentionally
> > > > > > > > > >> prevents migrations between CPUs of identical capacity.
> > > > > > > > > >>
> > > > > > > > > >> When CONFIG_SCHED_CLUSTER is enabled, load should be balanced across
> > > > > > > > > >> clusters of CPUs with the same capacity. Allowing migration between CPUs
> > > > > > > > > >> of identical capacity is necessary to meet this goal.
> > > > > > > > > >>
> > > > > > > > > >> Use arch_scale_cpu_capacity() to reflect architectural capacity, excluding
> > > > > > > > > >
> > > > > > > > > > capacity_of() reflects not only RT and irq pressure but also thermal
> > > > > > > > > > pressure or system frequency capping.
> > > > > > > > > > If dst cluster is under thermal mitigation but the source cluster is
> > > > > > > > > > not, we probably shouldn't spread tasks across both clusters.
> > > > > > > > > > Have you considered using get_actual_cpu_capacity() instead of
> > > > > > > > > > arch_scale_cpu_capacity() ?
> > > > > > > > >
> > > > > > > > > Replacing arch_scale_cpu_capacity() with get_actual_cpu_capacity()
> > > > > > > > > would make the == comparison below very unlikely to be true FWIW.
> > > > > > > >
> > > > > > > > Yes, this is what I thought too. I did not try with get_actual_cpu_capacity(),
> > > > > > > > though. Perhaps on Intel processors it would work since rq->avg_hw.load_avg
> > > > > > > > is not used, IIUC. I am not sure about cpufreq_pressure. I need to check.
> > > > > > > >
> > > > > > > > Still, it may work for Intel processors but not for ARM ones.
> > > > > > > >
> > > > > > > > > I think it's fine like that, I will prepare a follow-up anyway to make
> > > > > > > > > it work for our "almost equal capacity" cluster systems and then also
> > > > > > > > > consider switching to get_actual_cpu_capacity() since we include a margin
> > > > > > > > > anyway.
> > > > > > > >
> > > > > > > > Great!
> > > > > > >
> > > > > > > I confirmed that does not use rq->avg_hw.load_avg nor cpufreq_pressure.
> > > > > >
> > > > > > I'm not surprised that intel don't use rq->avg_hw.load_avg but I'm
> > > > > > pretty sure that you use cpufreq_pressure, because any call to
> > > > > > freq_qos_add_request(..., FREQ_QOS_MAX), like scaling_max_freq, will
> > > > > > update cpufreq_pressure.
> > > > >
> > > > > But in cpufreq_update_pressure() a non-zero pressure can be only computed
> > > > > if arch_scale_freq_ref() returns non-zero. x86 does not implement this
> > > > > function.
> > > > >
> > > > > The check max_freq <= capped_freq is always true because the default
> > > > > arch_scale_freq_ref() returns 0. The computed pressure is always 0. Am I
> > > > > missing something?
> > > >
> > > > Yeah, I forgot that Intel has not enabled it. You should consider
> > > > enable it because of the below
>
> Sure, I was thinking about the same.
> > >
> > > Aren't things supposed to still work if it is not enabled though?
>
> In my experiments, after capping frequency I did not observe changes in the
> behavior of the load balancer.
>
> >
> > The scheduler is not aware of any cpufreq qos max_freq constraint
> > applied on the CPUs and it consideres CPUs being able to run at their
> > max capacity
>
> I prototyped code for arch_scale_freq_ref() (along with using
> get_actual_cpu_capacity()) in my patchset. I did observe the load balancer
> moving tasks towards uncapped CPUs.
>
> I can post the patch for arch_scale_freq_ref() separately.
Thanks