Re: [PATCH V5 2/2] sched/fair: Remove group imbalance from calculate_imbalance()
From: Peter Zijlstra
Date: Wed Jul 05 2017 - 07:22:33 EST
On Wed, Jun 07, 2017 at 01:18:58PM -0600, Jeffrey Hugo wrote:
> The group_imbalance path in calculate_imbalance() made sense when it was
> added back in 2007 with commit 908a7c1b9b80 ("sched: fix improper load
> balance across sched domain") because busiest->load_per_task factored into
> the amount of imbalance that was calculated. That is not the case today.
It would be nice to have some more information on which patch(es)
> The group_imbalance path can only affect the outcome of
> calculate_imbalance() when the average load of the domain is less than the
> original busiest->load_per_task. In this case, busiest->load_per_task is
> overwritten with the scheduling domain load average. Thus
> busiest->load_per_task no longer represents actual load that can be moved.
> At the final comparison between env->imbalance and busiest->load_per_task,
> imbalance may be larger than the new busiest->load_per_task causing the
> check to fail under the assumption that there is a task that could be
> migrated to satisfy the imbalance. However env->imbalance may still be
> smaller than the original busiest->load_per_task, thus it is unlikely that
> there is a task that can be migrated to satisfy the imbalance.
> Calculate_imbalance() would not choose to run fix_small_imbalance() when we
> expect it should. In the worst case, this can result in idle cpus.
> Since the group imbalance path in calculate_imbalance() is at best a NOP
> but otherwise harmful, remove it.
load_per_task is horrible and should die. Ever since we did cgroup
support the number is complete crap, but even before that the concept
Most of the logic that uses the number stems from the pre-smp-nice era.
This also of course means that fix_small_imbalance() is probably a load
of crap. Digging through all that has been on the todo list for a long
while but somehow not something I've ever gotten to :/