Re: [PATCH v9] sched/fair: Filter false overloaded_group case for EAS

From: Christian Loehle

Date: Wed Feb 18 2026 - 11:13:10 EST

On 2/17/26 01:03, Qais Yousef wrote:
> On 02/12/26 09:55, Christian Loehle wrote:
>> On 2/11/26 01:48, Qais Yousef wrote:
>>> On 02/06/26 10:54, Vincent Guittot wrote:
>>>> With EAS, a group should be set overloaded if at least 1 CPU in the group
>>>> is overutilized but it can happen that a CPU is fully utilized by tasks
>>>> because of clamping the compute capacity of the CPU. In such case, the CPU
>>>> is not overutilized and as a result should not be set overloaded as well.
>>>>
>>>> group_overloaded being a higher priority than group_misfit, such group can
>>>> be selected as the busiest group instead of a group with a mistfit task
>>>> and prevents load_balance to select the CPU with the misfit task to pull
>>>> the latter on a fitting CPU.
>>>>
>>>> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
>>>> Tested-by: Pierre Gondois <pierre.gondois@xxxxxxx>
>>>> ---
>>>>
>>>> This patch was part of a larger patchset [1] but makes sense on its own and has
>>>> not changed since v2
>>>>
>>>> [1] https://lore.kernel.org/all/20251202181242.1536213-1-vincent.guittot@xxxxxxxxxx/
>>>
>>> I don't mind this. But I think with the original series misfit will be handled
>>> better with push lb, and if it is made to handle overloaded case (which my
>>> initial testing shows it is easily doable and I can't see clear bad impact
>>> yet), I think we can retire overutilized altogether.
>>>
>>
>> The EAS wakeup path (and therefore the push lb for that matter) is costly and workloads
>> are sensitive to it, it's trivial to see with hackbench. Overutilized prevents that.
>
> What workloads? I have been testing this and all I am seeing are great results
> so far.
>
> Hackbench is a super synthetic test that doesn't represent any real workload.
> It purely measures context switch overhead. I think I said this before, but
> I'll repeat it again. For most modern systems and workloads we really need to
> spend more time to make sure we do the correct task placement decision as the
> cost of a wrong fast decision is worse than a slow correct one. And this is not
> something special about mobile systems. Servers and others do care. For those
> who really don't want any additional overhead they can just disable the static
> key.

There's quite a few systems and workloads, especially in servers / datacenters
where the "fast cheap" placement is better...
But I guess that's going a bit off-topic now.

>
> FWIW I tried schbench, which is more realistic since it does something that
> represents a web server, and it measures throughput and latencies and I got 10%
> better throughput, 27% better P99 and 49% better max latencies. And yes, OU is
> completely disabled when I ran this test.
>
> But disclaimer again, I backported earlier (modified) version of the patch and
> running on non-mainline kernel with OOT changes applied that I think helps to
> demonstrate the benefit even better.

So I'm assuming this was with the old series that still changed feec() placement
trying a 'latency-aware' placement, otherwise the improvements you state don't
make sense to me.

>
> Vincent, I am trying to stress the importance of the work and its great
> potential. I am not expecting the initial merge to handle everything yet ;-)
>
>> Arguments about PELT inaccuracies during periods of unmet compute demand (and therefore
>> entirely bogus EAS computation results) aside, I don't see how we a push lb could retire
>> OU? If anything you're paying twice the price then for these scenarios?
>
> I am not seeing any price to be paid. Geekbench scores are within run-to-run
> variation.

Hackbench isn't the only one here, I can make an overview too.
There's definitely a measurable speedometer3.1 score regression with "never-OU", too.
Again though, if you tested this extensively and think the improvements outweigh,
please do share the setup and results.