Re: [PATCH 1/2] sched: fix and clean up calculate_imbalance

From: Rik van Riel
Date: Tue Jul 29 2014 - 10:55:08 EST


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/29/2014 05:04 AM, Vincent Guittot wrote:
> On 28 July 2014 20:16, <riel@xxxxxxxxxx> wrote:
>> From: Rik van Riel <riel@xxxxxxxxxx>
>>
>> There are several ways in which update_sd_pick_busiest can end
>> up picking an sd as "busiest" that has a below-average per-cpu
>> load.
>>
>> All of those could use the same correction that was previously
>> only applied when the selected group has a group imbalance.
>>
>> Additionally, the load balancing code will balance out the load
>> between domains that are below their maximum capacity. This
>> results in the load_above_capacity calculation underflowing,
>> creating a giant unsigned number, which is then removed by the
>> min() check below.
>
> The load_above capacity can't underflow with current version. The
> underflow that you mention above, could occur with the change you
> are doing in patch 2 which can select a group which not overloaded
> nor imbalanced.

With SD_ASYM_PACKING the current code can already hit the underflow.

You are right though that it does not become common until the second
patch has been applied.

>> In situations where all the domains are overloaded, or where only
>> the busiest domain is overloaded, that code is also superfluous,
>> since the normal env->imbalance calculation will figure out how
>> much to move. Remove the load_above_capacity calculation.
>
> IMHO, we should not remove that part which is used by
> prefer_sibling
>
> Originally, we had 2 type of busiest group: overloaded or
> imbalanced. You add a new one which has only a avg_load higher than
> other so you should handle this new case and keep the other ones
> unchanged

The "overloaded" case will simply degenerate into the first case,
if we move enough load to make the domain no longer overloaded,
but still above average.

In the case where the value calculated by the "overloaded" calculation
is different from the above-average, the old code took the minimum of
the two as how much to move.

The new case you ask for would simply take the other part of that
difference, once a domain is no longer overloaded.

I cannot think of any case where keeping the "overloaded" code would
result in the code behaving differently over the long term.

What am I overlooking?

>> Signed-off-by: Rik van Riel <riel@xxxxxxxxxx> ---
>> kernel/sched/fair.c | 33 ++++++++------------------------- 1 file
>> changed, 8 insertions(+), 25 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index
>> 45943b2..a28bb3b 100644 --- a/kernel/sched/fair.c +++
>> b/kernel/sched/fair.c @@ -6221,16 +6221,16 @@ void
>> fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
>> */ static inline void calculate_imbalance(struct lb_env *env,
>> struct sd_lb_stats *sds) { - unsigned long max_pull,
>> load_above_capacity = ~0UL; struct sg_lb_stats *local, *busiest;
>>
>> local = &sds->local_stat; busiest = &sds->busiest_stat;
>>
>> - if (busiest->group_imb) { + if (busiest->avg_load
>> <= sds->avg_load) {
>
> busiest->avg_load <= sds->avg_load is already handled in the
> fix_small_imbalance function, you should probably handle that here

OK, I will move that code into fix_small_imbalance()

- --
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJT17VRAAoJEM553pKExN6DHMgIAI4IQsezUS1B/t8FzgkUR+8K
7EPIlOmsKZN/odfC6G4TntfwJojlcOsIQlxJF+PNCoWU4U61THK+NXif2a9/rNUE
3ffsrhZVy576HExezkAOzC8Z+g+7Y8O77af1PkSWul6Y3Xb2lQGG8ey+gdkOZytQ
vwTlGQHGiUqiJaA1aohkz45Zbv2o7m7qCHoNPNvE9qK3WEY0StgLRQgfny6cgHsM
079Ecx5A5p7W/zL9kvMELQtU1QI0c7PLEGSy5rT0+8moZdR9biQF9ktDkoNawOD1
DLPguz+ZbLZUNOLezC18k8FoqLxkBkZiXQ25f20AFnLkJM4HC3A9EP+SsVZVc+M=
=1hLj
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/