Re: [PATCH 2/4] sched/fair: reduce minimal imbalance threshold

From: Valentin Schneider
Date: Wed Sep 16 2020 - 04:33:58 EST



On 16/09/20 07:53, Vincent Guittot wrote:
> On Tue, 15 Sep 2020 at 21:04, Valentin Schneider
> <valentin.schneider@xxxxxxx> wrote:
>> AIUI this is the culprit:
>>
>> if (100 * busiest->avg_load <=
>> env->sd->imbalance_pct * local->avg_load)
>> goto out_balanced;
>>
>> As in your case imbalance_pct=120 becomes the tipping point.
>>
>> Now, ultimately this would need to scale based on the underlying topology,
>> right? If you have a system with 2x32 cores running {33 threads, 34
>> threads}, the tipping point becomes imbalance_pct≈103; but then since you
>> have this many more cores, it is somewhat questionable.
>
> I wanted to stay conservative and to not trigger too much task
> migration because of small imbalance so I decided to decrease the
> default threshold to the same level as the MC groups but this can
> still generate unfairness. With your example of 2x32 cores, if you end
> up with 33 tasks in one group and 38 in the other one, the system is
> overloaded so you use load and imbalance_pct but the imbalance will
> stay below the new threshold and the 33 tasks will have 13% more
> running time.
>
> This new imbalance_pct seems a reasonable step to decrease the unfairness
>

No major complaint on the change itself, it's just that this static
imbalance_pct assignment is something I've never really been satisfied with
- at the same time figuring a (or several) correct value from the topology
isn't straightforward either.

At the same time, I believe Peter would be happy to get rid of the decimal
faff and make it all simple shifts, which would limit how much we can
fine-tune these (not necessarily a bad thing).