Re: [PATCH 1/2] sched: fix and clean up calculate_imbalance
From: Vincent Guittot
Date: Tue Jul 29 2014 - 05:05:20 EST
On 28 July 2014 20:16, <riel@xxxxxxxxxx> wrote:
> From: Rik van Riel <riel@xxxxxxxxxx>
>
> There are several ways in which update_sd_pick_busiest can end up
> picking an sd as "busiest" that has a below-average per-cpu load.
>
> All of those could use the same correction that was previously only
> applied when the selected group has a group imbalance.
>
> Additionally, the load balancing code will balance out the load between
> domains that are below their maximum capacity. This results in the
> load_above_capacity calculation underflowing, creating a giant unsigned
> number, which is then removed by the min() check below.
The load_above capacity can't underflow with current version. The
underflow that you mention above, could occur with the change you are
doing in patch 2 which can select a group which not overloaded nor
imbalanced.
>
> In situations where all the domains are overloaded, or where only the
> busiest domain is overloaded, that code is also superfluous, since
> the normal env->imbalance calculation will figure out how much to move.
> Remove the load_above_capacity calculation.
IMHO, we should not remove that part which is used by prefer_sibling
Originally, we had 2 type of busiest group: overloaded or imbalanced.
You add a new one which has only a avg_load higher than other so you
should handle this new case and keep the other ones unchanged
>
> Signed-off-by: Rik van Riel <riel@xxxxxxxxxx>
> ---
> kernel/sched/fair.c | 33 ++++++++-------------------------
> 1 file changed, 8 insertions(+), 25 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 45943b2..a28bb3b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6221,16 +6221,16 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
> */
> static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
> {
> - unsigned long max_pull, load_above_capacity = ~0UL;
> struct sg_lb_stats *local, *busiest;
>
> local = &sds->local_stat;
> busiest = &sds->busiest_stat;
>
> - if (busiest->group_imb) {
> + if (busiest->avg_load <= sds->avg_load) {
busiest->avg_load <= sds->avg_load is already handled in the
fix_small_imbalance function, you should probably handle that here
> /*
> - * In the group_imb case we cannot rely on group-wide averages
> - * to ensure cpu-load equilibrium, look at wider averages. XXX
> + * Busiest got picked because it is overloaded or imbalanced,
> + * but does not have an above-average load. Look at wider
> + * averages.
> */
> busiest->load_per_task =
> min(busiest->load_per_task, sds->avg_load);
> @@ -6247,32 +6247,15 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
> return fix_small_imbalance(env, sds);
> }
>
> - if (!busiest->group_imb) {
> - /*
> - * Don't want to pull so many tasks that a group would go idle.
> - * Except of course for the group_imb case, since then we might
> - * have to drop below capacity to reach cpu-load equilibrium.
> - */
> - load_above_capacity =
> - (busiest->sum_nr_running - busiest->group_capacity_factor);
> -
> - load_above_capacity *= (SCHED_LOAD_SCALE * SCHED_CAPACITY_SCALE);
> - load_above_capacity /= busiest->group_capacity;
> - }
> -
> /*
> * We're trying to get all the cpus to the average_load, so we don't
> * want to push ourselves above the average load, nor do we wish to
> - * reduce the max loaded cpu below the average load. At the same time,
> - * we also don't want to reduce the group load below the group capacity
> - * (so that we can implement power-savings policies etc). Thus we look
> - * for the minimum possible imbalance.
> + * reduce the max loaded cpu below the average load.
> + * The per-cpu avg_load values and the group capacity determine
> + * how much load to move to equalise the imbalance.
> */
> - max_pull = min(busiest->avg_load - sds->avg_load, load_above_capacity);
> -
> - /* How much load to actually move to equalise the imbalance */
> env->imbalance = min(
> - max_pull * busiest->group_capacity,
> + (busiest->avg_load - sds->avg_load) * busiest->group_capacity,
> (sds->avg_load - local->avg_load) * local->group_capacity
> ) / SCHED_CAPACITY_SCALE;
>
> --
> 1.9.3
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/