Re: [PATCH 0/2] numa,sched: resolve conflict between load balancing and NUMA balancing

From: Artem Bityutskiy
Date: Fri May 29 2015 - 03:45:09 EST


On Wed, 2015-05-27 at 15:04 -0400, riel@xxxxxxxxxx wrote:
> A previous attempt to resolve a major conflict between load balancing and
> NUMA balancing, changeset 095bebf61a46 ("sched/numa: Do not move past the
> balance point if unbalanced"), introduced its own problems.
>
> Revert that changeset, and introduce a new fix, which actually seems to
> resolve the issues observed in Jirka's tests.
>
> A test where the main thread creates a large memory area, and spawns
> a worker thread to iterate over the memory (placed on another node
> by select_task_rq_fair), after which the main thread goes to sleep
> and waits for the worker thread to loop over all the memory now sees
> the worker thread migrated to where the memory is, instead of having
> all the memory migrated over like before.
>
> Jirka has run a number of performance tests on several systems:
> single instance SpecJBB 2005 performance is 7-15% higher on a 4 node
> system, with higher gains on systems with more cores per socket.
> Multi-instance SpecJBB 2005 (one per node), linpack, and stream see
> little or no changes with the revert of 095bebf61a46 and this patch.

[Re-sending since it didn't hit the mailing list first time, due to
HTML]

Tested-by: Artem Bityutskiy <artem.bityutskiy@xxxxxxxxxxxxxxx>

Hi Rik,

I've executed our eCommerce Web workload benchmark. Last time I did not
revert 095bebf61a46, now I tested this patch-set. Let me summarize
everything.

Here is the average web server response time in millisecs for various
kernels.

1. v4.1-rc1 - 1450
2. v4.1-rc1 + a43455a1d572daf7b730fe12eb747d1e17411365 reverted - 300
3. v4.1-rc1 + NUMA disabled - 480
4. v4.1-rc5 + this patch-set - 1260

So as you see, for our workload reverting
a43455a1d572daf7b730fe12eb747d1e17411365
results in Web server being most responsive (reminder - this is about a
2-socket Haswell-EP
server).

Just disabling NUMA also gives a big improvement, but not as good as
reverting the
"offending" (from our workload's POW) patch.

This patch-set does result in a noticeable improvement too.

Artem.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/