Re: [PATCH] kbuild: pass jobserver to cmd_ld_vmlinux.o

From: Fangrui Song
Date: Fri Jun 17 2022 - 16:06:07 EST


On 2022-06-18, Masahiro Yamada wrote:
(+LLVM list, Fangrui Song)

Thanks for tagging me. I'll clarify some stuff.

On Fri, Jun 17, 2022 at 7:41 PM Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:

On Fri, Jun 17, 2022 at 12:35 PM Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
>
> On Fri, Jun 17, 2022 at 12:53 AM Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
> >
> > On Thu, Jun 16, 2022 at 4:09 PM Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
> > >
> > > On Thu, Jun 16, 2022 at 12:45 PM Jiri Slaby <jslaby@xxxxxxx> wrote:
> > > >
> > > > Until the link-vmlinux.sh split (cf. the commit below), the linker was
> > > > run with jobserver set in MAKEFLAGS. After the split, the command in
> > > > Makefile.vmlinux_o is not prefixed by "+" anymore, so this information
> > > > is lost.
> > > >
> > > > Restore it as linkers working in parallel (esp. the LTO ones) make a use
> > > > of i

Hi Jiri,

Please let me clarify first.

Here, is it OK to assume you are talking about Clang LTO
instead of GCC LTO because the latter is not upstreamed ?





I tested this patch but I did not see any performance change for Clang LTO.


[1] CONFIG_CLANG_LTO_FULL

lld always runs sequential.
It never runs in parallel even if you pass -j option to Make

"lld always runs sequential" is not accurate. There are a number of
parallel linker passes. ld.lld --threads= defaults to
llvm::hardware_concurrency (similar to
https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency,
but uses sched_getaffinity to compute the number of available cores).

"lld always runs sequential" is only correct only when --threads=1 is
specified or the system only provides one thread to the lld process.

I think people may be more interested in LTO parallelism here. Regular
LTO (sometimes called full LTO when there is mixed-thin-and-regular LTO)
supports limited parallelism which applies to code generation, but not
IR-level optimization. (IR-level optimization has many interprocedural
optimizations passes. Splitting will make LTO less effective. Code
generation is per function, so parallelism does not regress
optimization.)


[2] CONFIG_CLANG_LTO_THIN

lld always runs in parallel even if you do not pass -j option

In my machine, lld always allocated 12 threads.
This is irrespective of the Make parallelisms.




One more thing, if a program wants to participate in
Make's jobserver, it must parse MAKEFLAGS, and extract
file descriptors to be used to communicate to the jobserver.

As a code example in the kernel tree,
scripts/jobserver-exec parses "MAKEFLAGS" and "--jobserver".


I grepped the lld source code, but it does not contain
"MAKEFLAGS" or "jobserver".

masahiro@oscar:~/ref/lld$ git remote show origin
* remote origin
Fetch URL: https://github.com/llvm-mirror/lld.git
Push URL: https://github.com/llvm-mirror/lld.git
HEAD branch: master
Remote branches:
master tracked
release_36 tracked
release_37 tracked
release_38 tracked
release_39 tracked
release_40 tracked
release_50 tracked
release_60 tracked
release_70 tracked
release_80 tracked
release_90 tracked
Local branch configured for 'git pull':
master merges with remote master
Local ref configured for 'git push':
master pushes to master (up to date)
masahiro@oscar:~/ref/lld$ git grep MAKEFLAGS
masahiro@oscar:~/ref/lld$ git grep jobserver


So, in my research, LLD does not seem to support the jobserver.


Correct. lld does not support GNU make's jobserver. On the other hand,
I don't think the jobserver implementation supports flexible "give this
target N hardware concurrency". A heavy link target does not necessarily
get more resources than a quick target.

If a make target knows how many hardware concurrency it gets, we can
pass --threads= to lld. LTO easily takes 95+% link time, so LTO
parallelism may needs a dedicated setting. lld has --thinlto-jobs=.




If you are talking about GCC LTO, yes, the code
tries to parse "--jobserver-auth=" from the MAKEFLAGS
environment variable. [1]

[1]: https://github.com/gcc-mirror/gcc/blob/releases/gcc-12.1.0/gcc/lto-wrapper.cc#L1341


But, as you may know, GCC LTO works in a different way,
at least, we cannot do it before modpost.


--
Best Regards
Masahiro Yamada