Re: [PATCH] sched/fair: Don't pull task if local group is more loaded than busiest group
From: Li, Aubrey
Date: Wed Mar 25 2020 - 21:51:53 EST
On 2020/3/25 21:43, Phil Auld wrote:
> Hi Aubrey,
>
> On Wed, Mar 25, 2020 at 08:46:28PM +0800 Aubrey Li wrote:
>> A huge number of load imbalance was observed when the local group
>> type is group_fully_busy, and the average load of local group is
>> greater than the selected busiest group, so the imbalance calculation
>> returns a negative value actually. Fix this problem by comparing the
>> average load before local group type check.
>>
>> Signed-off-by: Aubrey Li <aubrey.li@xxxxxxxxxxxxxxx>
>> ---
>> kernel/sched/fair.c | 14 +++++++-------
>> 1 file changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index c1217bf..c524369 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -8862,17 +8862,17 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
>> goto out_balanced;
>>
>> /*
>> + * If the local group is more loaded than the selected
>> + * busiest group don't try to pull any tasks.
>> + */
>> + if (local->avg_load >= busiest->avg_load)
>> + goto out_balanced;
>> +
>> + /*
>> * When groups are overloaded, use the avg_load to ensure fairness
>> * between tasks.
>> */
>> if (local->group_type == group_overloaded) {
>> - /*
>> - * If the local group is more loaded than the selected
>> - * busiest group don't try to pull any tasks.
>> - */
>> - if (local->avg_load >= busiest->avg_load)
>> - goto out_balanced;
>> -
>> /* XXX broken for overlapping NUMA groups */
>> sds.avg_load = (sds.total_load * SCHED_CAPACITY_SCALE) /
>> sds.total_capacity;
>> --
>> 2.7.4
>>
>
> I'm not sure about this. I think this patch will undo a good bit of the
> benefit of the load balancer rework. Avg_load is really most useful
> when things are overloaded. If we go back to looking at it here we may
> fail to balance when needed.
>
> There are cases where, due to group scheduler load scaling, local average
> may be higher but have spare CPUs still whereas busiest may have extra
> processes which be balanced.
>
In my case, local group is fully busy, and the selected busiest group is overloaded,
should we compute avg_idle for group_fully_busy as well?
Thanks,
-Aubrey