[PATCH 6/6] sched,numa: check all nodes when placing a pseudo-interleaved group

From: riel
Date: Fri Oct 17 2014 - 03:31:12 EST


From: Rik van Riel <riel@xxxxxxxxxx>

In pseudo-interleaved numa_groups, all tasks try to relocate to
the group's preferred_nid. When a group is spread across multiple
NUMA nodes, this can lead to tasks swapping their location with
other tasks inside the same group, instead of having the group
converge on a few nodes.

When placing a task in a pseudo-interleaved numa_group, it pays
to examine all nodes, to see if it isn't better to move to eg.
the #2 node, so the group locality can be improved.

Signed-off-by: Rik van Riel <riel@xxxxxxxxxx>
Tested-by: Chegu Vinod <chegu_vinod@xxxxxx>
---
kernel/sched/fair.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b973d77..00c1137 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1425,8 +1425,15 @@ static int task_numa_migrate(struct task_struct *p)
/* Try to find a spot on the preferred nid. */
task_numa_find_cpu(&env, taskimp, groupimp);

- /* No space available on the preferred nid. Look elsewhere. */
- if (env.best_cpu == -1) {
+ /*
+ * Look at other nodes in these cases:
+ * - there is no space available on the preferred_nid
+ * - the task is part of a numa_group that is interleaved across
+ * multiple NUMA nodes; in order to better consolidate the group,
+ * we need to check other locations.
+ */
+ if (env.best_cpu == -1 || (p->numa_group &&
+ nodes_weight(p->numa_group->active_nodes) > 1)) {
for_each_online_node(nid) {
if (nid == env.src_nid || nid == p->numa_preferred_nid)
continue;
--
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/