Re: [PATCH] sched,numa: document and fix numa_preferred_nid setting

From: Srikar Dronamraju
Date: Mon Jun 22 2015 - 12:13:37 EST

> + * migrating the task to where it really belongs.
> + * The exception is a task that belongs to a large numa_group, which
> + * spans multiple NUMA nodes. If that task migrates into one of the
> + * workload's active nodes, remember that node as the task's
> + * numa_preferred_nid, so the workload can settle down.
> */
> if (p->numa_group) {
> if (env.best_cpu == -1)
> @@ -1513,7 +1520,7 @@ static int task_numa_migrate(struct task_struct *p)
> nid = env.dst_nid;
> if (node_isset(nid, p->numa_group->active_nodes))
> - sched_setnuma(p, env.dst_nid);
> + sched_setnuma(p, nid);
> }
> /* No better CPU than the current one was found. */

When I refer to the Modified Rik's patch, I mean to remove the
node_isset() check before the sched_setnuma. With that change, we kind
of reduce the numa02 and 1JVMper System regression while getting as good
numbers as Rik's patch with 2JVM and 4JVM per System.

The idea behind removing the node_isset check is:
node_isset is mostly used to track mem movement to nodes where cpus are
running and not vice versa. This is as per comment in
update_numa_active_node_mask. There could be a sitation where task memory
is all in a node and the node has capacity to accomodate but no tasks
associated with the task have run enuf on that node. In such a case, we
shouldnt be ruling out migrating the task to the node.

Thanks and Regards
Srikar Dronamraju

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at