[PATCH 1/2] sched: fix and clean up calculate_imbalance

From: riel
Date: Mon Jul 28 2014 - 14:16:59 EST


From: Rik van Riel <riel@xxxxxxxxxx>

There are several ways in which update_sd_pick_busiest can end up
picking an sd as "busiest" that has a below-average per-cpu load.

All of those could use the same correction that was previously only
applied when the selected group has a group imbalance.

Additionally, the load balancing code will balance out the load between
domains that are below their maximum capacity. This results in the
load_above_capacity calculation underflowing, creating a giant unsigned
number, which is then removed by the min() check below.

In situations where all the domains are overloaded, or where only the
busiest domain is overloaded, that code is also superfluous, since
the normal env->imbalance calculation will figure out how much to move.
Remove the load_above_capacity calculation.

Signed-off-by: Rik van Riel <riel@xxxxxxxxxx>
---
kernel/sched/fair.c | 33 ++++++++-------------------------
1 file changed, 8 insertions(+), 25 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 45943b2..a28bb3b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6221,16 +6221,16 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
*/
static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
{
- unsigned long max_pull, load_above_capacity = ~0UL;
struct sg_lb_stats *local, *busiest;

local = &sds->local_stat;
busiest = &sds->busiest_stat;

- if (busiest->group_imb) {
+ if (busiest->avg_load <= sds->avg_load) {
/*
- * In the group_imb case we cannot rely on group-wide averages
- * to ensure cpu-load equilibrium, look at wider averages. XXX
+ * Busiest got picked because it is overloaded or imbalanced,
+ * but does not have an above-average load. Look at wider
+ * averages.
*/
busiest->load_per_task =
min(busiest->load_per_task, sds->avg_load);
@@ -6247,32 +6247,15 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
return fix_small_imbalance(env, sds);
}

- if (!busiest->group_imb) {
- /*
- * Don't want to pull so many tasks that a group would go idle.
- * Except of course for the group_imb case, since then we might
- * have to drop below capacity to reach cpu-load equilibrium.
- */
- load_above_capacity =
- (busiest->sum_nr_running - busiest->group_capacity_factor);
-
- load_above_capacity *= (SCHED_LOAD_SCALE * SCHED_CAPACITY_SCALE);
- load_above_capacity /= busiest->group_capacity;
- }
-
/*
* We're trying to get all the cpus to the average_load, so we don't
* want to push ourselves above the average load, nor do we wish to
- * reduce the max loaded cpu below the average load. At the same time,
- * we also don't want to reduce the group load below the group capacity
- * (so that we can implement power-savings policies etc). Thus we look
- * for the minimum possible imbalance.
+ * reduce the max loaded cpu below the average load.
+ * The per-cpu avg_load values and the group capacity determine
+ * how much load to move to equalise the imbalance.
*/
- max_pull = min(busiest->avg_load - sds->avg_load, load_above_capacity);
-
- /* How much load to actually move to equalise the imbalance */
env->imbalance = min(
- max_pull * busiest->group_capacity,
+ (busiest->avg_load - sds->avg_load) * busiest->group_capacity,
(sds->avg_load - local->avg_load) * local->group_capacity
) / SCHED_CAPACITY_SCALE;

--
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/