[PATCH 1/7 v3] sched/fair: Filter false overloaded_group case for EAS

From: Vincent Guittot
Date: Fri Feb 28 2025 - 08:40:19 EST


With EAS, a group should be set overloaded if at least 1 CPU in the group
is overutilized but it can happen that a CPU is fully utilized by tasks
because of clamping the compute capacity of the CPU. In such case, the CPU
is not overutilized and as a result should not be set overloaded as well.

group_overloaded being a higher priority than group_misfit, such group can
be selected as the busiest group instead of a group with a mistfit task
and prevents load_balance to select the CPU with the misfit task to pull
the latter on a fitting CPU.

Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
Tested-by: Pierre Gondois <pierre.gondois@xxxxxxx>
---
kernel/sched/fair.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 857808da23d8..d3d1a2ba6b1a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9931,6 +9931,7 @@ struct sg_lb_stats {
unsigned int group_asym_packing; /* Tasks should be moved to preferred CPU */
unsigned int group_smt_balance; /* Task on busy SMT be moved */
unsigned long group_misfit_task_load; /* A CPU has a task too big for its capacity */
+ unsigned int group_overutilized; /* At least one CPU is overutilized in the group */
#ifdef CONFIG_NUMA_BALANCING
unsigned int nr_numa_running;
unsigned int nr_preferred_running;
@@ -10163,6 +10164,13 @@ group_has_capacity(unsigned int imbalance_pct, struct sg_lb_stats *sgs)
static inline bool
group_is_overloaded(unsigned int imbalance_pct, struct sg_lb_stats *sgs)
{
+ /*
+ * With EAS and uclamp, 1 CPU in the group must be overutilized to
+ * consider the group overloaded.
+ */
+ if (sched_energy_enabled() && !sgs->group_overutilized)
+ return false;
+
if (sgs->sum_nr_running <= sgs->group_weight)
return false;

@@ -10374,8 +10382,10 @@ static inline void update_sg_lb_stats(struct lb_env *env,
nr_running = rq->nr_running;
sgs->sum_nr_running += nr_running;

- if (cpu_overutilized(i))
+ if (cpu_overutilized(i)) {
*sg_overutilized = 1;
+ sgs->group_overutilized = 1;
+ }

/*
* No need to call idle_cpu() if nr_running is not 0
--
2.43.0