Re: 答复: 答复: [PATCH] mm: Add RWH_RMAP_EXCLUDE flag to exclude files from rmap sharing

From: Mateusz Guzik

Date: Fri Apr 24 2026 - 03:08:29 EST


On Fri, Apr 24, 2026 at 8:20 AM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
>
> On Fri, Apr 24, 2026 at 5:20 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> >
> > On Fri, Apr 24, 2026 at 01:08:35AM +0000, Yibin Liu wrote:
> > > On an Intel Emerald Rapids server (112 cores), run the execl benchmark from
> > > UnixBench with the command: ./Run -c 220 execl
> > > Then perf top shows:
> > >
> > > 91.53% [kernel] [k] osq_lock
> > > 0.50% [kernel] [k] rwsem_spin_on_owner
> >
> > OK, but does this represent a realistic workload? It's pretty easy to
> > construct workloads that hammer on particular locks; the question is
> > whether it's a relevant performance bottleneck that customers care about.
>
> This is a genuine problem when doing large-scale package building.
> I'll say upfront I have extensive experience with this crap on
> FreeBSD, I did not run it on Linux myself, but bear with me here --
> while FreeBSD is in doubt a less scalable kernel, Linux demonstrated
> to be suffering from the same problems.
>
> Say you have a box with a core count of 100 and get it to work
> building up to 100 packages at a time. Further, even if you use some
> form of separation from file-system standpoint on userspace level, you
> still want to share the common binaries to reduce memory + cache
> footprint so you at least --bind them. Then you are susceptible to
> contention issues at least on paper.
>
> Granted, building a pig like chromium scales great because it is
> written in c++ and almost all of the time is spent in userspace, with
> forks and execs of the compiler highly spread out in time, in turn
> putting very little pressure on the locks.
>
> However, vast majority of packages is very tiny in comparison
> (literally a few .c files) and this is where things go south as they
> engage in exec frenzy, looking like a borderline microbenchmark. The
> primary culprit is configure scripts, issuing an idiotic number of
> back-to-back execs of short-lived processes (notably sed, but also
> grep, rm and others). There is a lot of evil in makefiles as well.
>
> I don't have numbers handy, but in case of the FreeBSD ports tree we
> are talking about over 10 000 ports which on their own take few
> seconds to build. Since these are largely single-threaded, if you have
> package-building machinery which can saturate the box, you easily end
> up with parallel builds matching your core count. And when they engage
> in exec-frenzy for the duration, you may as well be microbenchmarking
> it.
>
> A sufficiently pessimized workload is indistinguishable from a
> microbenchmark and this here is an example of one.
>
> iow this is a real problem, but I don't have specific numbers for Linux.

I had gcc handy on Linux, so I ran configure on it by hand. This is
all autotools generated, so general theme matches the small programs
as well.

I ended up with the following execs:
285 /usr/bin/sed
95 /usr/bin/rm
88 /usr/bin/grep
77 /usr/bin/cat
30 /usr/bin/gcc
27 /usr/bin/expr
23 /usr/libexec/gcc/x86_64-linux-gnu/15/cc1
21 /usr/bin/as
10 /usr/bin/uname
8 /usr/libexec/gcc/x86_64-linux-gnu/15/collect2
8 /usr/bin/ld
8 /bin/bash
7 /usr/bin/mv
7 /usr/bin/c++
6 /usr/bin/mkdir
6 /usr/bin/basename
5 /usr/bin/ln
5 /usr/bin/cmp
4 /usr/bin/sort
4 /usr/bin/dirname
3 /usr/bin/rmdir
3 /usr/bin/hostname
3 /usr/bin/gawk
3 /usr/bin/cc
2 /usr/bin/strip
2 /usr/bin/mktemp
2 /usr/bin/ls
2 /usr/bin/egrep
2 /usr/bin/cp
2 /usr/bin/ar
2 /bin/sh
1 /usr/lib/llvm-20/bin/clang
1 /usr/bin/tr
1 /usr/bin/touch
1 /usr/bin/ranlib
1 /usr/bin/install
1 /usr/bin/diff
1 /usr/bin/chmod
1 /usr/bin/arch
1 /home/mjg/repos/gcc/missing
1 /bin/uname
1 /bin/arch
1 ./contrib/compare-debug
1 ./conftest
1 ./configure

It spent almost half of the runtime in the kernel, all while there was
no contention.

So imagine all that to compile a few .c files & rinse & repeat
thousands of times in parallel on 100+ cores.