Re: [PATCH] sched,numa: document and fix numa_preferred_nid setting

From: Rik van Riel
Date: Thu Jun 18 2015 - 12:07:00 EST


On 06/18/2015 11:55 AM, Srikar Dronamraju wrote:
>> if (p->numa_group) {
>> if (env.best_cpu == -1)
>> @@ -1513,7 +1520,7 @@ static int task_numa_migrate(struct task_struct *p)
>> nid = env.dst_nid;
>>
>> if (node_isset(nid, p->numa_group->active_nodes))
>> - sched_setnuma(p, env.dst_nid);
>> + sched_setnuma(p, nid);
>> }
>>
>> /* No better CPU than the current one was found. */
>>
>
> Overall this patch does seem to produce better results. However numa02
> gets affected -vely.

OK, that is kind of expected.

The way numa02 runs means that we are essentially guaranteed
that, on a two node system, both nodes end up in the numa_group's
active_mask.

What the above change does is slow down migration if a task ends
up in a NUMA node in p->numa_group->active_nodes.

This is necessary if a very large workload has already converged
on a set of NUMA nodes, but it does slow down convergence for such
workloads.

I can't think of any obvious way to both slow down movement once
things have converged, yet keep speedy movement of tasks when they
have not yet converged.

It is worth noting that all the numa01 and numa02 benchmarks
measure is the speed at which the workloads converge. It does not
measure the overhead of making things converge, or how fast an
actual workload runs (NUMA locality benefit, minus NUMA placement
overhead).

> KernelVersion: 4.1.0-rc7-tip
> Testcase: Min Max Avg StdDev
> elapsed_numa01: 858.85 949.18 915.64 33.06
> elapsed_numa02: 23.09 29.89 26.43 2.18
> Testcase: Min Max Avg StdDev
> system_numa01: 1516.72 1855.08 1686.24 113.95
> system_numa02: 63.69 79.06 70.35 5.87
> Testcase: Min Max Avg StdDev
> user_numa01: 73284.76 80818.21 78060.88 2773.60
> user_numa02: 1690.18 2071.07 1821.64 140.25
> Testcase: Min Max Avg StdDev
> total_numa01: 74801.50 82572.60 79747.12 2875.61
> total_numa02: 1753.87 2142.77 1891.99 143.59
>
> KernelVersion: 4.1.0-rc7-tip + your patch
>
> Testcase: Min Max Avg StdDev %Change
> elapsed_numa01: 665.26 877.47 776.77 79.23 15.83%
> elapsed_numa02: 24.59 31.30 28.17 2.48 -5.56%
> Testcase: Min Max Avg StdDev %Change
> system_numa01: 659.57 1220.99 942.36 234.92 60.92%
> system_numa02: 44.62 86.01 64.64 14.24 6.64%
> Testcase: Min Max Avg StdDev %Change
> user_numa01: 56280.95 75908.81 64993.57 7764.30 17.21%
> user_numa02: 1790.35 2155.02 1916.12 132.57 -4.38%
> Testcase: Min Max Avg StdDev %Change
> total_numa01: 56940.50 77128.20 65935.92 7993.49 17.91%
> total_numa02: 1834.97 2227.03 1980.76 136.51 -3.99%
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/