Re: [PATCH 4/4] sched/numa: Do not move imbalanced load purely on the basis of an idle CPU

From: Srikar Dronamraju
Date: Wed Sep 12 2018 - 02:55:06 EST


* Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> [2018-09-10 10:41:47]:

> On Fri, Sep 07, 2018 at 01:37:39PM +0100, Mel Gorman wrote:
> > > Srikar's patch here:
> > >
> > > http://lkml.kernel.org/r/1533276841-16341-4-git-send-email-srikar@xxxxxxxxxxxxxxxxxx
> > >
> > > Also frobs this condition, but in a less radical way. Does that yield
> > > similar results?
> >
> > I can check. I do wonder of course if the less radical approach just means
> > that automatic NUMA balancing and the load balancer simply disagree about
> > placement at a different time. It'll take a few days to have an answer as
> > the battery of workloads to check this take ages.
> >
>
> Tests completed over the weekend and I've found that the performance of
> both patches are very similar for two machines (both 2 socket) running a
> variety of workloads. Hence, I'm not worried about which patch gets picked
> up. However, I would prefer my own on the grounds that the additional
> complexity does not appear to get us anything. Of course, that changes if
> Srikar's tests on his larger ppc64 machines show the more complex approach
> is justified.
>

Running SPECJbb2005. Higher bops are better.

Kernel A = 4.18+ 13 sched patches part of v4.19-rc1.
Kernel B = Kernel A + 6 patches (http://lore.kernel.org/lkml/1533276841-16341-1-git-send-email-srikar@xxxxxxxxxxxxxxxxxx)
Kernel C = Kernel B - (Avoid task migration for small numa improvement) i.e
http://lore.kernel.org/lkml/1533276841-16341-4-git-send-email-srikar@xxxxxxxxxxxxxxxxxx
+ 2 patches from Mel
(Do not move imbalanced load purely)
http://lore.kernel.org/lkml/20180907101139.20760-5-mgorman@xxxxxxxxxxxxxxxxxxx
(Stop comparing tasks for NUMA placement)
http://lore.kernel.org/lkml/20180907101139.20760-4-mgorman@xxxxxxxxxxxxxxxxxxx

To me, Kernel B which is the 13 patches accepted in v4.19-rc1 + 6 patches
posted for review seem to be giving better performance.

The numbers are compared to previous kernel i.e
for Kernel A, v4.18 is prev
for kernel B, Kernel A is prev
for Kernel C, B is prev

2 node x86 Haswell

v4.18 or 94710cac0ef4
JVMS Prev Current %Change
4 203769
1 316734

Kernel A
JVMS Prev Current %Change
4 203769 209790 2.95482
1 316734 312377 -1.3756

Kernel B
JVMS Prev Current %Change
4 209790 202059 -3.68511
1 312377 326987 4.67704

Kernel C
JVMS Prev Current %Change
4 202059 200681 -0.681979
1 326987 316715 -3.14141

================================================


4 Node / 2 Socket PowerNV / Power 8

v4.18 or 94710cac0ef4
JVMS Prev Current %Change
8 88411.9
1 222075

Kernel A
JVMS Prev Current %Change
8 88411.9 88733.5 0.363752
1 222075 214607 -3.36283

Kernel B
JVMS Prev Current %Change
8 88733.5 89952 1.37321
1 214607 217226 1.22037

Kernel C
JVMS Prev Current %Change
8 89952 89912.9 -0.0434676
1 217226 219281 0.946019


================================================


2 Node / 2 Socket Power 9 / PowerNV

v4.18 or 94710cac0ef4
JVMS Prev Current %Change
4 195989
1 202854

Kernel A
JVMS Prev Current %Change
4 195989 193108 -1.46998
1 202854 204042 0.585643

Kernel B
JVMS Prev Current %Change
4 193108 196422 1.71614
1 204042 211219 3.51741

Kernel C
JVMS Prev Current %Change
4 196422 195052 -0.697478
1 211219 207854 -1.59313


================================================

4 Node / 4 Socket Power 7 PhyP LPAR.

v4.18 or 94710cac0ef4
JVMS Prev Current %Change
8 52826.9
1 103103

Kernel A
JVMS Prev Current %Change
8 52826.9 59504.4 12.6403
1 103103 102542 -0.544116

Kernel B
JVMS Prev Current %Change
8 59504.4 61674.8 3.64746
1 102542 108211 5.52847

Kernel C
JVMS Prev Current %Change
8 61674.8 57946.5 -6.04509
1 108211 104533 -3.39892