Re: [LKP] [x86, sched] 1567c3e346: vm-scalability.median -15.8% regression

From: Giovanni Gherdovich
Date: Thu Jul 09 2020 - 08:43:45 EST


On Tue, 2020-07-07 at 10:58 +0800, Xing Zhengjun wrote:
>
> On 6/12/2020 4:11 PM, Xing Zhengjun wrote:
> > Hi Giovanni,
> >
> > I test the regression, it still existed in v5.7. Do you have time
> > to take a look at this? Thanks.
> >
>
> Ping...
>

Hello,

I haven't sat down to reproduce this yet but I've read the benchmark code and
configuration, and this regression seems likely to be more of a benchmarking
artifact than an actual performance bug.

Likely a benchmarking artifact:

First off, the test used the "performance" governor from the "intel_pstate"
cpufreq driver, but points at the patch introducing the "frequency invariance
on x86" feature as the culprit. This is suspicious because "frequency
invariance on x86" influences frequency selection when the "schedutil" governor
is in use (not your case). It may also affect the scheduler load balancing but
here you have $NUM_CPUS processes so there isn't a lot of room for creativity
there, each CPU gets a process.

Some notes on this benchmark for my future reference:

The test in question is "anon-cow-seq" from "vm-scalability", which is based
on the "usemem" program originally written by Andrew Morton and exercises the
memory management subsystem. The invocation is:

usemem --nproc $NUM_CPUS \
--prealloc \
--prefault \
$SIZE

What this does is to create an anonymous mmap()-ing of $SIZE bytes in the main
process, fork $NUM_CPUS distinct child processes and have all of them scan the
mapping sequentially from byte 0 to byte N, writing 0, 1, 2, ..., N on the
region as they scan it, all together at the same time. So we have the "anon"
part (the mapping isn't file-backed), the "cow" part (the parent process
allocates the region, then each children copy-on-write's to it) and the "seq"
part (memory accesses happen sequentially from low to high address). The test
measures how quick this happens; I believe the regression happens in the
median time it takes a process to finish (or the median throughput, but $SIZE
is fixed so it's equivalent).

The $SIZE parameter is selected so that there is enough space for everybody:
each children plus the parent need a copy of the mapped region, so that makes
$NUM_CPUS+1 instances. The formula for $SIZE adds a factor 2 for good measure:

SIZE = $MEM_SIZE / ($NUM_CPUS + 1) / 2

So we have a benchmark dominated by page allocation and copying, run with the
"performance" cpufreq governor, and your bisections points to a commit such as
1567c3e3467cddeb019a7b53ec632f834b6a9239 ("x86, sched: Add support for
frequency invariance") which:

* changes how frequency is selected by a governor you're not using
* doesn't touch the memory management subsystem or related functions

I'm not entirely dismissing your finding, just explaining why this analysis
hasn't been in my top priorities lately (plus, I've just returned from a 3
weeks vacation :). I'm curious too about what causes the test to go red, but
I'm not overly worried given the above context.


Thanks,
Giovanni Gherdovich