Re: [PATCH v2 0/5] remove runnable_load_avg and improve group_classify

From: Mel Gorman
Date: Sat Feb 15 2020 - 16:58:35 EST

Next message: Pavel Begunkov: "[PATCH v2 0/5] async punting improvements for io_uring"
Previous message: kbuild test robot: "[rcu:dev.2020.02.13b] BUILD SUCCESS 860b1b7edde9e9f699440de5b6ae91cdeb987708"
In reply to: Vincent Guittot: "[PATCH v2 5/5] sched/fair: Take into account runnable_avg to classify group"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Feb 14, 2020 at 04:27:24PM +0100, Vincent Guittot wrote:
> This new version stays quite close to the previous one and should
> replace without problems the previous one that part of Mel's patchset:
> https://lkml.org/lkml/2020/2/14/156
>

As far as I can see, the differences are harmless and look sane. I do think
that an additional fix is mandatory as I see no reason why the regression
was fixed. As such, I'll release a v3 of the series that includes your
new patches with the minimal fix inserted where appropriate. I'll have
tests running over the rest of the weekend.

> Some hackbench results:
>
> - small arm64 dual quad cores system
> hackbench -l (2560/#grp) -g #grp
>
> grp tip/sched/core +patchset improvement
> 1 1,327(+/-10,06 %) 1,247(+/-5,45 %) 5,97 %
> 4 1,250(+/- 2,55 %) 1,207(+/-2,12 %) 3,42 %
> 8 1,189(+/- 1,47 %) 1,179(+/-1,93 %) 0,90 %
> 16 1,221(+/- 3,25 %) 1,219(+/-2,44 %) 0,16 %
>
> - large arm64 2 nodes / 224 cores system
> hackbench -l (256000/#grp) -g #grp
>
> grp tip/sched/core +patchset improvement
> 1 14,197(+/- 2,73 %) 13,917(+/- 2,19 %) 1,98 %
> 4 6,817(+/- 1,27 %) 6,523(+/-11,96 %) 4,31 %
> 16 2,930(+/- 1,07 %) 2,911(+/- 1,08 %) 0,66 %
> 32 2,735(+/- 1,71 %) 2,725(+/- 1,53 %) 0,37 %
> 64 2,702(+/- 0,32 %) 2,717(+/- 1,07 %) -0,53 %
> 128 3,533(+/-14,66 %) 3,123(+/-12,47 %) 11,59 %
> 256 3,918(+/-19,93 %) 3,390(+/- 5,93 %) 13,47 %
>
> The significant improvement for 128 and 256 should be taken with care
> because of some instabilities over iterations without the patchset.
>

For the most part I do not see similar results to this with hackbench with
one exception -- EPYC first generation. I don't have results for EPYC 2
yet but I'm curious if the machine you have has multiple L3 caches per
NUMA domain? Various Intel CPU generations show improvements but they're
not as dramatic. Tests will tell me for sure but I have some confidence
that it'll look like

Small tracing patches -- no difference
Vincent Patches 1-2 -- regressions
Fix from Mel -- small overall improvement on baseline
Vincent patches 3-5 -- small improvements mostly, sometimes big ones
on hackbench depending on the machine
Rest of Mel series -- generally ok across machines and CPU generations

Even if the improvements are not dramatic, I think it'll be worth it to
have NUMA and CPU balancer using similarly sane logic and overall I find
the load balancer easier to understand with the new logic so yey!

--
Mel Gorman
SUSE Labs

Next message: Pavel Begunkov: "[PATCH v2 0/5] async punting improvements for io_uring"
Previous message: kbuild test robot: "[rcu:dev.2020.02.13b] BUILD SUCCESS 860b1b7edde9e9f699440de5b6ae91cdeb987708"
In reply to: Vincent Guittot: "[PATCH v2 5/5] sched/fair: Take into account runnable_avg to classify group"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]