[PATCH 2/2] sched: make update_sd_pick_busiest return true on a busier sd

From: riel
Date: Mon Jul 28 2014 - 14:16:54 EST


From: Rik van Riel <riel@xxxxxxxxxx>

Currently update_sd_pick_busiest only identifies the busiest sd
that is either overloaded, or has a group imbalance. When no
sd is imbalanced or overloaded, the load balancer fails to find
the busiest domain.

This breaks load balancing between domains that are not overloaded,
in the !SD_ASYM_PACKING case. This patch makes update_sd_pick_busiest
return true when the busiest sd yet is encountered.

Groups are ranked in the order overloaded > imbalanced > other,
with higher ranked groups getting priority even when their load
is lower. This is necessary due to the possibility of unequal
capacities and cpumasks between domains within a sched group.

Behaviour for SD_ASYM_PACKING does not seem to match the comment,
but I have no hardware to test that so I have left the behaviour
of that code unchanged.

Enum for group classification suggested by Peter Zijlstra.

Cc: mikey@xxxxxxxxxxx
Cc: peterz@xxxxxxxxxxxxx
Acked-by: Michael Neuling <mikey@xxxxxxxxxxx>
Signed-off-by: Rik van Riel <riel@xxxxxxxxxx>
---
kernel/sched/fair.c | 34 ++++++++++++++++++++++++++++------
1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a28bb3b..4f5e3c2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5610,6 +5610,8 @@ static inline void init_sd_lb_stats(struct sd_lb_stats *sds)
.total_capacity = 0UL,
.busiest_stat = {
.avg_load = 0UL,
+ .sum_nr_running = 0,
+ .group_imb = 0,
},
};
}
@@ -5949,6 +5951,23 @@ static inline void update_sg_lb_stats(struct lb_env *env,
sgs->group_has_free_capacity = 1;
}

+enum group_type {
+ group_other = 0,
+ group_imbalanced,
+ group_overloaded,
+};
+
+static enum group_type group_classify(struct sg_lb_stats *sgs)
+{
+ if (sgs->sum_nr_running > sgs->group_capacity_factor)
+ return group_overloaded;
+
+ if (sgs->group_imb)
+ return group_imbalanced;
+
+ return group_other;
+}
+
/**
* update_sd_pick_busiest - return 1 on busiest group
* @env: The load balancing environment.
@@ -5967,13 +5986,17 @@ static bool update_sd_pick_busiest(struct lb_env *env,
struct sched_group *sg,
struct sg_lb_stats *sgs)
{
- if (sgs->avg_load <= sds->busiest_stat.avg_load)
+ if (group_classify(sgs) > group_classify(&sds->busiest_stat))
+ return true;
+
+ if (group_classify(sgs) < group_classify(&sds->busiest_stat))
return false;

- if (sgs->sum_nr_running > sgs->group_capacity_factor)
- return true;
+ if (sgs->avg_load <= sds->busiest_stat.avg_load)
+ return false;

- if (sgs->group_imb)
+ /* This is the busiest node in its class. */
+ if (!(env->sd->flags & SD_ASYM_PACKING))
return true;

/*
@@ -5981,8 +6004,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
* numbered CPUs in the group, therefore mark all groups
* higher than ourself as busy.
*/
- if ((env->sd->flags & SD_ASYM_PACKING) && sgs->sum_nr_running &&
- env->dst_cpu < group_first_cpu(sg)) {
+ if (sgs->sum_nr_running && env->dst_cpu < group_first_cpu(sg)) {
if (!sds->busiest)
return true;

--
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/