Re: [PATCH 0/6 v8] sched/fair: Add push task mechanism and handle more EAS cases

From: Vincent Guittot

Date: Thu Mar 12 2026 - 04:19:56 EST

On Tue, 10 Mar 2026 at 18:00, Pierre Gondois <pierre.gondois@xxxxxxx> wrote:
>
>
> On 3/10/26 16:11, Qais Yousef wrote:
> > On 03/10/26 11:27, Pierre Gondois wrote:
> >
> >> If we have 2 little CPUs (CPU0/CPU1) with 4 tasks:
> >> - TaskA: Nice=10 (i.e. weight=110)
> >> - Task[B,C,D]: Nice=15 (i.e. weight=36)
> >>
> >> Then using nr_running would yield a placement as with 2 tasks
> >> on each CPU:
> >> - CPU0: TaskA + TaskB
> >> Total weight = 110 + 36 = 146
> >> - CPU1: TaskC + TaskD
> >> Total weight = 36 + 36 = 52
> >> With such placement:
> >> - TaskA and TaskB are receiving less throughput
> >> - TaskC and TaskD are receiving more throughput
> >> than what they would if the placement was balanced.
> >>
> >> This is not compliant with the scheduler Nice interface.
> > This is over thinking it. On 2 core SMP system, no uclamp and no EAS. 4 always
> > busy tasks with different nice values will still be placed based on load and
> > neither wake up path nor load balancer has notion of throughput based on nice
> > to manage task placement.
>
> Yes right, by setting the Nice value of tasks and using the
> associated weight (Nice=10 -> weight=110), I also meant that
> the load of these tasks was approximately equal to the weight.
> I.e.:
> - TaskA: Nice=10 <-> weight=110 <-> load=110
> - Task[B,C,D]: Nice=15 <-> weight=36 <-> load=36
> In that regard, the load balancer balances load between CPUs
> to try to provide an equal throughput to all tasks
> (in respect to their weight or Nice value).
>
> I only have doubt about the the push mechanism for the setup with:
> - EAS
> - long running tasks + UCLAMP_MAX
> because in that setup case the Nice value and CPU load is ignored,
> leading to task placement that can be incorrect.

The previous rework of feec that I sent was a 1st step in the
direction where we not only take into account NRG but also other hints
such as nr_running and then slice duration.

>
> Just to be sure, I am not arguing in the non-EAS case. As the
> load balancer is active in that case, there is a mechanism
> to have a global 'fairness' among CPUs.
> When EAS is active, the load balancer is disabled and there is
> no mechanism to manage the load between CPUs.
>
> Vincent's patchset was advertised to help EAS:
> "sched/fair: Add push task mechanism and handle more EAS cases"
> so I was more thinking about that case.

This is a starting point but the push task mechanism can be used for
other use cases too. One use case is pushing tasks to idle CPUs when
the system is overloaded, for example. The end goal is to call the
same select_task_rq function every time. And as Qais already said, we
could disable periodic load balancing at the LLC level or further
increase the period.

> If the goal is to have unified wake-up + load balancer framework
> I currently have nothing to object.
>
> (On a throughput-related subject)
> I am working on a mechanism to try to help handling throughput
> on HMP. This might be posted as RFC at some point, if you
> have some time to have a look later.
>
> >
> > Generally with EEVDF managing the slice size is better than nice value and with
> > the QoS framework we are proposing I think nice value is better locked down to
> > 0. But we shall see.
> Maybe I m completely off but I thought the EEVDF slice length
> and the Nice values were handling different things. If you
> have a link that shows your QoS approach and how they interact
> I m interested.
> > More over the idea is to enable wake up path to be multi-modal and coherent
> > with lb decision (via push lb). So fixing all these problems is possible in the
> > future, fingers crossed without much added complexity. But again, we shall see.