Re: [PATCH 1/2] sched/fair: Couple wakee flips with heavy wakers
From: Mike Galbraith
Date: Tue Oct 26 2021 - 22:09:51 EST
On Tue, 2021-10-26 at 14:13 +0200, Mike Galbraith wrote:
> On Tue, 2021-10-26 at 12:57 +0100, Mel Gorman wrote:
> >
> > The patch in question was also tested on other workloads on NUMA
> > machines. For a 2-socket machine (20 cores, HT enabled so 40 CPUs)
> > running specjbb 2005 with one JVM per NUMA node, the patch also
> > scaled
> > reasonably well
>
> That's way more more interesting. No idea what this thing does under
> the hood thus whether it should be helped or not, but at least it's a
> real deal benchmark vs a kernel hacker tool.
...
Installing test specjbb
specjvm-install: Fetching from mirror
http://mcp/mmtests-mirror/spec/SPECjbb2005_kitv1.07.tar.gz
specjvm-install: Fetching from internet
NOT_AVAILABLE/SPECjbb2005_kitv1.07.tar.gz
specjvm-install: Fetching from alt internet
/SPECjbb2005_kitv1.07.tar.gz
FATAL specjvm-install: specjvm-install: Could not download
/SPECjbb2005_kitv1.07.tar.gz
FATAL specjbb-bench: specjbb install script returned error
FATAL: specjbb returned failure, unable to continue
FATAL: Installation step failed for specjbb
Hohum, so much for trying to take a peek.
At any rate, unlike the tbench numbers, these have the look of signal
rather than test jig noise, and pretty strong signal at that, so maybe
patchlet should fly. At the very least, it appears to be saying that
there is significant performance to be had by some means.
Bah, fly or die little patchlet. Either way there will be winners and
losers, that's just the way it works if you're not shaving cycles.
> > specjbb
> > 5.15.0-rc3 5.15.0-rc3
> > vanilla sched-wakeeflips-v1r1
> > Hmean tput-1 50044.48 ( 0.00%) 53969.00 * 7.84%*
> > Hmean tput-2 106050.31 ( 0.00%) 113580.78 * 7.10%*
> > Hmean tput-3 156701.44 ( 0.00%) 164857.00 * 5.20%*
> > Hmean tput-4 196538.75 ( 0.00%) 218373.42 * 11.11%*
> > Hmean tput-5 247566.16 ( 0.00%) 267173.09 * 7.92%*
> > Hmean tput-6 284981.46 ( 0.00%) 311007.14 * 9.13%*
> > Hmean tput-7 328882.48 ( 0.00%) 359373.89 * 9.27%*
> > Hmean tput-8 366941.24 ( 0.00%) 393244.37 * 7.17%*
> > Hmean tput-9 402386.74 ( 0.00%) 433010.43 * 7.61%*
> > Hmean tput-10 437551.05 ( 0.00%) 475756.08 * 8.73%*
> > Hmean tput-11 481349.41 ( 0.00%) 519824.54 * 7.99%*
> > Hmean tput-12 533148.45 ( 0.00%) 565070.21 * 5.99%*
> > Hmean tput-13 570563.97 ( 0.00%) 609499.06 * 6.82%*
> > Hmean tput-14 601117.97 ( 0.00%) 647876.05 * 7.78%*
> > Hmean tput-15 639096.38 ( 0.00%) 690854.46 * 8.10%*
> > Hmean tput-16 682644.91 ( 0.00%) 722826.06 * 5.89%*
> > Hmean tput-17 732248.96 ( 0.00%) 758805.17 * 3.63%*
> > Hmean tput-18 762771.33 ( 0.00%) 791211.66 * 3.73%*
> > Hmean tput-19 780582.92 ( 0.00%) 819064.19 * 4.93%*
> > Hmean tput-20 812183.95 ( 0.00%) 836664.87 * 3.01%*
> > Hmean tput-21 821415.48 ( 0.00%) 833734.23 ( 1.50%)
> > Hmean tput-22 815457.65 ( 0.00%) 844393.98 * 3.55%*
> > Hmean tput-23 819263.63 ( 0.00%) 846109.07 * 3.28%*
> > Hmean tput-24 817962.95 ( 0.00%) 839682.92 * 2.66%*
> > Hmean tput-25 807814.64 ( 0.00%) 841826.52 * 4.21%*
> > Hmean tput-26 811755.89 ( 0.00%) 838543.08 * 3.30%*
> > Hmean tput-27 799341.75 ( 0.00%) 833487.26 * 4.27%*
> > Hmean tput-28 803434.89 ( 0.00%) 829022.50 * 3.18%*
> > Hmean tput-29 803233.25 ( 0.00%) 826622.37 * 2.91%*
> > Hmean tput-30 800465.12 ( 0.00%) 824347.42 * 2.98%*
> > Hmean tput-31 791284.39 ( 0.00%) 791575.67 ( 0.04%)
> > Hmean tput-32 781930.07 ( 0.00%) 805725.80 ( 3.04%)
> > Hmean tput-33 785194.31 ( 0.00%) 804795.44 ( 2.50%)
> > Hmean tput-34 781325.67 ( 0.00%) 800067.53 ( 2.40%)
> > Hmean tput-35 777715.92 ( 0.00%) 753926.32 ( -3.06%)
> > Hmean tput-36 770516.85 ( 0.00%) 783328.32 ( 1.66%)
> > Hmean tput-37 758067.26 ( 0.00%) 772243.18 * 1.87%*
> > Hmean tput-38 764815.45 ( 0.00%) 769156.32 ( 0.57%)
> > Hmean tput-39 757885.41 ( 0.00%) 757670.59 ( -0.03%)
> > Hmean tput-40 750140.15 ( 0.00%) 760739.13 ( 1.41%)
> >
> > The largest regression was within noise. Most results were outside the
> > noise.
> >
> > Some HPC workloads showed little difference but they do not communicate
> > that heavily. redis microbenchmark showed mostly neutral results.
> > schbench (facebook simulator workload that is latency sensitive) showed a
> > mix of results, but helped more than it hurt. Even the machine with the
> > worst results for schbench showed improved wakeup latencies at the 99th
> > percentile. These were all on NUMA machines.
> >
>