Re: [PATCH] kbuild: pass jobserver to cmd_ld_vmlinux.o

From: Sedat Dilek
Date: Sat Jun 18 2022 - 02:13:52 EST


4

On Fri, Jun 17, 2022 at 10:05 PM Fangrui Song <maskray@xxxxxxxxxx> wrote:
>
> On 2022-06-18, Masahiro Yamada wrote:
> >(+LLVM list, Fangrui Song)
>
> Thanks for tagging me. I'll clarify some stuff.
>
> >On Fri, Jun 17, 2022 at 7:41 PM Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
> >>
> >> On Fri, Jun 17, 2022 at 12:35 PM Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
> >> >
> >> > On Fri, Jun 17, 2022 at 12:53 AM Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
> >> > >
> >> > > On Thu, Jun 16, 2022 at 4:09 PM Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
> >> > > >
> >> > > > On Thu, Jun 16, 2022 at 12:45 PM Jiri Slaby <jslaby@xxxxxxx> wrote:
> >> > > > >
> >> > > > > Until the link-vmlinux.sh split (cf. the commit below), the linker was
> >> > > > > run with jobserver set in MAKEFLAGS. After the split, the command in
> >> > > > > Makefile.vmlinux_o is not prefixed by "+" anymore, so this information
> >> > > > > is lost.
> >> > > > >
> >> > > > > Restore it as linkers working in parallel (esp. the LTO ones) make a use
> >> > > > > of i
> >
> >Hi Jiri,
> >
> >Please let me clarify first.
> >
> >Here, is it OK to assume you are talking about Clang LTO
> >instead of GCC LTO because the latter is not upstreamed ?
> >
> >
> >
> >
> >
> >I tested this patch but I did not see any performance change for Clang LTO.
> >
> >
> >[1] CONFIG_CLANG_LTO_FULL
> >
> > lld always runs sequential.
> > It never runs in parallel even if you pass -j option to Make
>
> "lld always runs sequential" is not accurate. There are a number of
> parallel linker passes. ld.lld --threads= defaults to
> llvm::hardware_concurrency (similar to
> https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency,
> but uses sched_getaffinity to compute the number of available cores).
>
> "lld always runs sequential" is only correct only when --threads=1 is
> specified or the system only provides one thread to the lld process.
>
> I think people may be more interested in LTO parallelism here. Regular
> LTO (sometimes called full LTO when there is mixed-thin-and-regular LTO)
> supports limited parallelism which applies to code generation, but not
> IR-level optimization. (IR-level optimization has many interprocedural
> optimizations passes. Splitting will make LTO less effective. Code
> generation is per function, so parallelism does not regress
> optimization.)
>
> >
> >[2] CONFIG_CLANG_LTO_THIN
> >
> > lld always runs in parallel even if you do not pass -j option
> >
> > In my machine, lld always allocated 12 threads.
> > This is irrespective of the Make parallelisms.
> >
> >
> >
> >
> >One more thing, if a program wants to participate in
> >Make's jobserver, it must parse MAKEFLAGS, and extract
> >file descriptors to be used to communicate to the jobserver.
> >
> >As a code example in the kernel tree,
> >scripts/jobserver-exec parses "MAKEFLAGS" and "--jobserver".
> >
> >
> >I grepped the lld source code, but it does not contain
> >"MAKEFLAGS" or "jobserver".
>
> >masahiro@oscar:~/ref/lld$ git remote show origin
> >* remote origin
> > Fetch URL: https://github.com/llvm-mirror/lld.git
> > Push URL: https://github.com/llvm-mirror/lld.git
> > HEAD branch: master
> > Remote branches:
> > master tracked
> > release_36 tracked
> > release_37 tracked
> > release_38 tracked
> > release_39 tracked
> > release_40 tracked
> > release_50 tracked
> > release_60 tracked
> > release_70 tracked
> > release_80 tracked
> > release_90 tracked
> > Local branch configured for 'git pull':
> > master merges with remote master
> > Local ref configured for 'git push':
> > master pushes to master (up to date)
> >masahiro@oscar:~/ref/lld$ git grep MAKEFLAGS
> >masahiro@oscar:~/ref/lld$ git grep jobserver
> >
> >
> >So, in my research, LLD does not seem to support the jobserver.
>
>
> Correct. lld does not support GNU make's jobserver. On the other hand,
> I don't think the jobserver implementation supports flexible "give this
> target N hardware concurrency". A heavy link target does not necessarily
> get more resources than a quick target.
>
> If a make target knows how many hardware concurrency it gets, we can
> pass --threads= to lld. LTO easily takes 95+% link time, so LTO
> parallelism may needs a dedicated setting. lld has --thinlto-jobs=.
>

Hey Fangrui,

I played a bit with --thinlto-jobs=4 yesterday.

$ cat 0001-vmlinux-clang-thinlto-Add-thinlto-jobs-4-to-KBUILD_L.patch