Re: [PATCH] sched/fair: Don't pull task if local group is more loaded than busiest group

From: Vincent Guittot
Date: Wed Mar 25 2020 - 09:58:27 EST


Le mercredi 25 mars 2020 à 20:46:28 (+0800), Aubrey Li a écrit :
> A huge number of load imbalance was observed when the local group
> type is group_fully_busy, and the average load of local group is
> greater than the selected busiest group, so the imbalance calculation
> returns a negative value actually. Fix this problem by comparing the

Do you see any wrong task migration ? because if imbalance is < 0, detach_tasks should return early

> average load before local group type check.
>
> Signed-off-by: Aubrey Li <aubrey.li@xxxxxxxxxxxxxxx>
> ---
> kernel/sched/fair.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c1217bf..c524369 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8862,17 +8862,17 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
> goto out_balanced;
>
> /*
> + * If the local group is more loaded than the selected
> + * busiest group don't try to pull any tasks.
> + */
> + if (local->avg_load >= busiest->avg_load)

If local is not overloaded, local->avg_load is null because it has not been already computed.

You should better add such test in calculate_imbalance after we computed local->avg_load and set imbalance to NULL

something like:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9e544e702f66..62f0861cefc0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9152,6 +9152,15 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s

sds->avg_load = (sds->total_load * SCHED_CAPACITY_SCALE) /
sds->total_capacity;
+
+ /*
+ * If the local group is more loaded than the selected
+ * busiest group don't try to pull any tasks.
+ */
+ if (local->avg_load >= busiest->avg_load) {
+ env->imbalance =0;
+ return;
+ }
}

/*


> + goto out_balanced;
> +
> + /*
> * When groups are overloaded, use the avg_load to ensure fairness
> * between tasks.
> */
> if (local->group_type == group_overloaded) {
> - /*
> - * If the local group is more loaded than the selected
> - * busiest group don't try to pull any tasks.
> - */
> - if (local->avg_load >= busiest->avg_load)
> - goto out_balanced;
> -
> /* XXX broken for overlapping NUMA groups */
> sds.avg_load = (sds.total_load * SCHED_CAPACITY_SCALE) /
> sds.total_capacity;
> --
> 2.7.4
>