[tip: sched/core] sched/numa: Bias swapping tasks based on their preferred node

From: tip-bot2 for Mel Gorman
Date: Mon Feb 24 2020 - 10:21:12 EST


The following commit has been merged into the sched/core branch of tip:

Commit-ID: 88cca72c9673e631b63eca7a1dba4a9722a3f414
Gitweb: https://git.kernel.org/tip/88cca72c9673e631b63eca7a1dba4a9722a3f414
Author: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
AuthorDate: Mon, 24 Feb 2020 09:52:22
Committer: Ingo Molnar <mingo@xxxxxxxxxx>
CommitterDate: Mon, 24 Feb 2020 11:36:39 +01:00

sched/numa: Bias swapping tasks based on their preferred node

When swapping tasks for NUMA balancing, it is preferred that tasks move
to or remain on their preferred node. When considering an imbalance,
encourage tasks to move to their preferred node and discourage tasks from
moving away from their preferred node.

Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
Acked-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
Cc: Juri Lelli <juri.lelli@xxxxxxxxxx>
Cc: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
Cc: Valentin Schneider <valentin.schneider@xxxxxxx>
Cc: Phil Auld <pauld@xxxxxxxxxx>
Cc: Hillf Danton <hdanton@xxxxxxxx>
Link: https://lore.kernel.org/r/20200224095223.13361-13-mgorman@xxxxxxxxxxxxxxxxxxx
---
kernel/sched/fair.c | 43 +++++++++++++++++++++++++++++++++++++------
1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 050c1b1..8c1ac01 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1741,18 +1741,27 @@ static void task_numa_compare(struct task_numa_env *env,
goto unlock;
}

+ /* Skip this swap candidate if cannot move to the source cpu. */
+ if (!cpumask_test_cpu(env->src_cpu, cur->cpus_ptr))
+ goto unlock;
+
+ /*
+ * Skip this swap candidate if it is not moving to its preferred
+ * node and the best task is.
+ */
+ if (env->best_task &&
+ env->best_task->numa_preferred_nid == env->src_nid &&
+ cur->numa_preferred_nid != env->src_nid) {
+ goto unlock;
+ }
+
/*
* "imp" is the fault differential for the source task between the
* source and destination node. Calculate the total differential for
* the source task and potential destination task. The more negative
* the value is, the more remote accesses that would be expected to
* be incurred if the tasks were swapped.
- */
- /* Skip this swap candidate if cannot move to the source cpu */
- if (!cpumask_test_cpu(env->src_cpu, cur->cpus_ptr))
- goto unlock;
-
- /*
+ *
* If dst and source tasks are in the same NUMA group, or not
* in any group then look only at task weights.
*/
@@ -1779,6 +1788,19 @@ static void task_numa_compare(struct task_numa_env *env,
task_weight(cur, env->dst_nid, dist);
}

+ /* Discourage picking a task already on its preferred node */
+ if (cur->numa_preferred_nid == env->dst_nid)
+ imp -= imp / 16;
+
+ /*
+ * Encourage picking a task that moves to its preferred node.
+ * This potentially makes imp larger than it's maximum of
+ * 1998 (see SMALLIMP and task_weight for why) but in this
+ * case, it does not matter.
+ */
+ if (cur->numa_preferred_nid == env->src_nid)
+ imp += imp / 8;
+
if (maymove && moveimp > imp && moveimp > env->best_imp) {
imp = moveimp;
cur = NULL;
@@ -1786,6 +1808,15 @@ static void task_numa_compare(struct task_numa_env *env,
}

/*
+ * Prefer swapping with a task moving to its preferred node over a
+ * task that is not.
+ */
+ if (env->best_task && cur->numa_preferred_nid == env->src_nid &&
+ env->best_task->numa_preferred_nid != env->src_nid) {
+ goto assign;
+ }
+
+ /*
* If the NUMA importance is less than SMALLIMP,
* task migration might only result in ping pong
* of tasks and also hurt performance due to cache