[PATCH] sched/fair: Restore env status before goto redo in load_balance()

From: Xunlei Pang
Date: Wed Mar 18 2015 - 03:06:38 EST


From: Xunlei Pang <pang.xunlei@xxxxxxxxxx>

In load_balance(), some members of lb_env will be assigned with
new values in LBF_DST_PINNED case. But lb_env::flags may still
retain LBF_ALL_PINNED if no proper tasks were found afterwards
due to another balance, task affinity changing, etc, which can
really happen because busiest rq lock has already been released.

This is wrong, for example with env.dst_cpu assigned new_dst_cpu
when going back to "redo" label, it may cause should_we_balance()
to return false which is unreasonable.

This patch restores proper status of env before "goto redo", and
improves "out_all_pinned" and "out_one_pinned" labels.

Signed-off-by: Xunlei Pang <pang.xunlei@xxxxxxxxxx>
---
kernel/sched/fair.c | 35 ++++++++++++++++++++---------------
1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ee595ef..45bbda1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6843,6 +6843,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
.dst_cpu = this_cpu,
.dst_rq = this_rq,
.dst_grpmask = sched_group_cpus(sd->groups),
+ .new_dst_cpu = -1,
.idle = idle,
.loop_break = sched_nr_migrate_break,
.cpus = cpus,
@@ -6977,12 +6978,19 @@ more_balance:
/* All tasks on this runqueue were pinned by CPU affinity */
if (unlikely(env.flags & LBF_ALL_PINNED)) {
cpumask_clear_cpu(cpu_of(busiest), cpus);
- if (!cpumask_empty(cpus)) {
- env.loop = 0;
- env.loop_break = sched_nr_migrate_break;
- goto redo;
+ if (env.new_dst_cpu != -1) {
+ env.new_dst_cpu = -1;
+ cpumask_or(cpus, cpus,
+ sched_group_cpus(sd->groups));
+ cpumask_and(cpus, cpus, cpu_active_mask);
+
+ env.dst_cpu = this_cpu;
+ env.dst_rq = this_rq;
}
- goto out_all_pinned;
+ env.flags &= ~LBF_SOME_PINNED;
+ env.loop = 0;
+ env.loop_break = sched_nr_migrate_break;
+ goto redo;
}
}

@@ -7009,7 +7017,7 @@ more_balance:
raw_spin_unlock_irqrestore(&busiest->lock,
flags);
env.flags |= LBF_ALL_PINNED;
- goto out_one_pinned;
+ goto out_active_balanced;
}

/*
@@ -7058,26 +7066,23 @@ more_balance:
out_balanced:
/*
* We reach balance although we may have faced some affinity
- * constraints. Clear the imbalance flag if it was set.
+ * constraints.
+ *
+ * When LBF_ALL_PINNED was not set, clear the imbalance flag
+ * if it was set.
*/
- if (sd_parent) {
+ if (sd_parent && !(env.flags & LBF_ALL_PINNED)) {
int *group_imbalance = &sd_parent->groups->sgc->imbalance;

if (*group_imbalance)
*group_imbalance = 0;
}

-out_all_pinned:
- /*
- * We reach balance because all tasks are pinned at this level so
- * we can't migrate them. Let the imbalance flag set so parent level
- * can try to migrate them.
- */
schedstat_inc(sd, lb_balanced[idle]);

sd->nr_balance_failed = 0;

-out_one_pinned:
+out_active_balanced:
/* tune up the balancing interval */
if (((env.flags & LBF_ALL_PINNED) &&
sd->balance_interval < MAX_PINNED_INTERVAL) ||
--
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/