[tip:sched/core] sched/numa: Use effective_load() to balance NUMA loads

From: tip-bot for Rik van Riel
Date: Sat Jul 05 2014 - 06:46:29 EST


Commit-ID: 6dc1a672ab15604947361dcd02e459effa09bad5
Gitweb: http://git.kernel.org/tip/6dc1a672ab15604947361dcd02e459effa09bad5
Author: Rik van Riel <riel@xxxxxxxxxx>
AuthorDate: Mon, 23 Jun 2014 11:46:14 -0400
Committer: Ingo Molnar <mingo@xxxxxxxxxx>
CommitDate: Sat, 5 Jul 2014 11:17:35 +0200

sched/numa: Use effective_load() to balance NUMA loads

When CONFIG_FAIR_GROUP_SCHED is enabled, the load that a task places
on a CPU is determined by the group the task is in. The active groups
on the source and destination CPU can be different, resulting in a
different load contribution by the same task at its source and at its
destination. As a result, the load needs to be calculated separately
for each CPU, instead of estimated once with task_h_load().

Getting this calculation right allows some workloads to converge,
where previously the last thread could get stuck on another node,
without being able to migrate to its final destination.

Signed-off-by: Rik van Riel <riel@xxxxxxxxxx>
Cc: mgorman@xxxxxxx
Cc: chegu_vinod@xxxxxx
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Link: http://lkml.kernel.org/r/1403538378-31571-3-git-send-email-riel@xxxxxxxxxx
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
kernel/sched/fair.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f287d0b..d6526d2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1151,6 +1151,7 @@ static void task_numa_compare(struct task_numa_env *env,
struct rq *src_rq = cpu_rq(env->src_cpu);
struct rq *dst_rq = cpu_rq(env->dst_cpu);
struct task_struct *cur;
+ struct task_group *tg;
long src_load, dst_load;
long load;
long imp = (groupimp > 0) ? groupimp : taskimp;
@@ -1225,14 +1226,21 @@ static void task_numa_compare(struct task_numa_env *env,
* In the overloaded case, try and keep the load balanced.
*/
balance:
- load = task_h_load(env->p);
- dst_load = env->dst_stats.load + load;
- src_load = env->src_stats.load - load;
+ src_load = env->src_stats.load;
+ dst_load = env->dst_stats.load;
+
+ /* Calculate the effect of moving env->p from src to dst. */
+ load = env->p->se.load.weight;
+ tg = task_group(env->p);
+ src_load += effective_load(tg, env->src_cpu, -load, -load);
+ dst_load += effective_load(tg, env->dst_cpu, load, load);

if (cur) {
- load = task_h_load(cur);
- dst_load -= load;
- src_load += load;
+ /* Cur moves in the opposite direction. */
+ load = cur->se.load.weight;
+ tg = task_group(cur);
+ src_load += effective_load(tg, env->src_cpu, load, load);
+ dst_load += effective_load(tg, env->dst_cpu, -load, -load);
}

if (load_too_imbalanced(src_load, dst_load, env))
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/