Re: [PATCH 1/5] sched/fair: Filter false overloaded_group case for EAS

From: Hongyan Xia
Date: Mon Sep 02 2024 - 05:01:35 EST


On 30/08/2024 14:03, Vincent Guittot wrote:
With EAS, a group should be set overloaded if at least 1 CPU in the group
is overutilized bit it can happen that a CPU is fully utilized by tasks
because of clamping the compute capacity of the CPU. In such case, the CPU
is not overutilized and as a result should not be set overloaded as well.

group_overloaded being a higher priority than group_misfit, such group can
be selected as the busiest group instead of a group with a mistfit task
and prevents load_balance to select the CPU with the misfit task to pull
the latter on a fitting CPU.

Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
---
kernel/sched/fair.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fea057b311f6..e67d6029b269 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9806,6 +9806,7 @@ struct sg_lb_stats {
enum group_type group_type;
unsigned int group_asym_packing; /* Tasks should be moved to preferred CPU */
unsigned int group_smt_balance; /* Task on busy SMT be moved */
+ unsigned long group_overutilized; /* No CPU is overutilized in the group */

Does this have to be unsigned long? I think a shorter width like bool (or int to be consistent with other fields) expresses the intention.

Also the comment to me is a bit confusing. All other fields are positive but this one's comment is in a negative tone.

unsigned long group_misfit_task_load; /* A CPU has a task too big for its capacity */
#ifdef CONFIG_NUMA_BALANCING
unsigned int nr_numa_running;
@@ -10039,6 +10040,13 @@ group_has_capacity(unsigned int imbalance_pct, struct sg_lb_stats *sgs)
static inline bool
group_is_overloaded(unsigned int imbalance_pct, struct sg_lb_stats *sgs)
{
+ /*
+ * With EAS and uclamp, 1 CPU in the group must be overutilized to
+ * consider the group overloaded.
+ */
+ if (sched_energy_enabled() && !sgs->group_overutilized)
+ return false;
+
if (sgs->sum_nr_running <= sgs->group_weight)
return false;
@@ -10252,8 +10260,10 @@ static inline void update_sg_lb_stats(struct lb_env *env,
if (nr_running > 1)
*sg_overloaded = 1;
- if (cpu_overutilized(i))
+ if (cpu_overutilized(i)) {
*sg_overutilized = 1;
+ sgs->group_overutilized = 1;
+ }
#ifdef CONFIG_NUMA_BALANCING
sgs->nr_numa_running += rq->nr_numa_running;