[PATCH v2] sched: make update_sd_pick_busiest return true on a busier sd
From: Rik van Riel
Date: Fri Jul 25 2014 - 15:32:58 EST
Subject: sched: make update_sd_pick_busiest return true on a busier sd
Currently update_sd_pick_busiest only identifies the busiest sd
that is either overloaded, or has a group imbalance. When no
sd is imbalanced or overloaded, the load balancer fails to find
the busiest domain.
This breaks load balancing between domains that are not overloaded,
in the !SD_ASYM_PACKING case. This patch makes update_sd_pick_busiest
return true when the busiest sd yet is encountered.
Behaviour for SD_ASYM_PACKING does not seem to match the comment,
but I have no hardware to test that so I have left the behaviour
of that code unchanged.
It is unclear what to do with the group_imb condition.
Should group_imb override a busier load? If so, should we fix
calculate_imbalance to return a sensible number when the "busiest"
node found has a below average load? We probably need to fix
calculate_imbalance anyway, to deal with an overloaded group that
happens to have a below average load...
Cc: mikey@xxxxxxxxxxx
Cc: peterz@xxxxxxxxxxxxx
Signed-off-by: Rik van Riel <riel@xxxxxxxxxx>
---
kernel/sched/fair.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 45943b2..c96044f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5949,6 +5949,11 @@ static inline void update_sg_lb_stats(struct lb_env *env,
sgs->group_has_free_capacity = 1;
}
+static bool group_overloaded(struct sg_lb_stats *sgs)
+{
+ return sgs->sum_nr_running > sgs->group_capacity_factor;
+}
+
/**
* update_sd_pick_busiest - return 1 on busiest group
* @env: The load balancing environment.
@@ -5957,7 +5962,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
* @sgs: sched_group statistics
*
* Determine if @sg is a busier group than the previously selected
- * busiest group.
+ * busiest group.
*
* Return: %true if @sg is a busier group than the previously selected
* busiest group. %false otherwise.
@@ -5967,13 +5972,17 @@ static bool update_sd_pick_busiest(struct lb_env *env,
struct sched_group *sg,
struct sg_lb_stats *sgs)
{
+ if (group_overloaded(sgs) && !group_overloaded(&sds->busiest_stat))
+ return true;
+
if (sgs->avg_load <= sds->busiest_stat.avg_load)
return false;
- if (sgs->sum_nr_running > sgs->group_capacity_factor)
+ if (sgs->group_imb)
return true;
- if (sgs->group_imb)
+ /* This is the busiest node. */
+ if (!(env->sd->flags & SD_ASYM_PACKING))
return true;
/*
@@ -5981,8 +5990,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
* numbered CPUs in the group, therefore mark all groups
* higher than ourself as busy.
*/
- if ((env->sd->flags & SD_ASYM_PACKING) && sgs->sum_nr_running &&
- env->dst_cpu < group_first_cpu(sg)) {
+ if (sgs->sum_nr_running && env->dst_cpu < group_first_cpu(sg)) {
if (!sds->busiest)
return true;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/