[PATCH v2 3/4] sched:Fix task_numa_migrate to always update preferred node

From: Srikar Dronamraju
Date: Tue Jun 16 2015 - 07:56:52 EST


In task_numa_migrate(), env.dst_nid points to either a preferred node or
a node that has free capacity and has more task weight than the current
node.

Currently in such a scenario, there are checks to see if tasks in the
numa_group have previously run on the node that has free capacity before
updating the preferred node. Commit (c1ceac62: "sched/numa: Reduce
conflict between fbq_classify_rq() and migration") gives preferance to
preferred node while load balancing. Hence if setting the
preferred_node after evaluating is skipped, then the task might miss
opportunity later at load balancing time to move to the preferred node.

In such a scenario, it makes sense to unconditionally set env.dst_nid as
the preferred node unless the said node is already the preferred node.

While here, update env.dst_nid only when both task and groups benefit.
This is as per the comment in the code.

Signed-off-by: Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx>
---
kernel/sched/fair.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7b23efa..d1aa374 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1503,7 +1503,7 @@ static int task_numa_migrate(struct task_struct *p)
/* Only consider nodes where both task and groups benefit */
taskimp = task_weight(p, nid, dist) - taskweight;
groupimp = group_weight(p, nid, dist) - groupweight;
- if (taskimp < 0 && groupimp < 0)
+ if (taskimp < 0 || groupimp < 0)
continue;

env.dist = dist;
@@ -1519,16 +1519,9 @@ static int task_numa_migrate(struct task_struct *p)
* and is migrating into one of the workload's active nodes, remember
* this node as the task's preferred numa node, so the workload can
* settle down.
- * A task that migrated to a second choice node will be better off
- * trying for a better one later. Do not set the preferred node here.
*/
if (p->numa_group) {
- if (env.best_cpu == -1)
- nid = env.src_nid;
- else
- nid = env.dst_nid;
-
- if (node_isset(nid, p->numa_group->active_nodes))
+ if (env.dst_nid != p->numa_preferred_nid)
sched_setnuma(p, env.dst_nid);
}

--
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/