[tip:sched/core] sched/numa: Check all nodes when placing a pseudo-interleaved group

From: tip-bot for Rik van Riel
Date: Tue Oct 28 2014 - 07:06:33 EST


Commit-ID: 9de05d48711cd5314920ed05f873d84eaf66ccf1
Gitweb: http://git.kernel.org/tip/9de05d48711cd5314920ed05f873d84eaf66ccf1
Author: Rik van Riel <riel@xxxxxxxxxx>
AuthorDate: Thu, 9 Oct 2014 17:27:47 -0400
Committer: Ingo Molnar <mingo@xxxxxxxxxx>
CommitDate: Tue, 28 Oct 2014 10:47:52 +0100

sched/numa: Check all nodes when placing a pseudo-interleaved group

In pseudo-interleaved numa_groups, all tasks try to relocate to
the group's preferred_nid. When a group is spread across multiple
NUMA nodes, this can lead to tasks swapping their location with
other tasks inside the same group, instead of swapping location with
tasks from other NUMA groups. This can keep NUMA groups from converging.

Examining all nodes, when dealing with a task in a pseudo-interleaved
NUMA group, avoids this problem. Note that only CPUs in nodes that
improve the task or group score are examined, so the loop isn't too
bad.

Tested-by: Vinod Chegu <chegu_vinod@xxxxxx>
Signed-off-by: Rik van Riel <riel@xxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Cc: "Vinod Chegu" <chegu_vinod@xxxxxx>
Cc: mgorman@xxxxxxx
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Link: http://lkml.kernel.org/r/20141009172747.0d97c38c@xxxxxxxxxxxxxxxxxxxxx
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
kernel/sched/fair.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7760c2a..ec32c26d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1436,8 +1436,15 @@ static int task_numa_migrate(struct task_struct *p)
/* Try to find a spot on the preferred nid. */
task_numa_find_cpu(&env, taskimp, groupimp);

- /* No space available on the preferred nid. Look elsewhere. */
- if (env.best_cpu == -1) {
+ /*
+ * Look at other nodes in these cases:
+ * - there is no space available on the preferred_nid
+ * - the task is part of a numa_group that is interleaved across
+ * multiple NUMA nodes; in order to better consolidate the group,
+ * we need to check other locations.
+ */
+ if (env.best_cpu == -1 || (p->numa_group &&
+ nodes_weight(p->numa_group->active_nodes) > 1)) {
for_each_online_node(nid) {
if (nid == env.src_nid || nid == p->numa_preferred_nid)
continue;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/