Re: Very slow clang kernel config ..

From: Nick Desaulniers
Date: Fri Apr 30 2021 - 20:20:30 EST


On Thu, Apr 29, 2021 at 7:22 PM Nick Desaulniers
<ndesaulniers@xxxxxxxxxx> wrote:
>
> On Thu, Apr 29, 2021 at 5:19 PM Nick Desaulniers
> <ndesaulniers@xxxxxxxxxx> wrote:
> >
> > On Thu, Apr 29, 2021 at 2:53 PM Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > I haven't looked into why this is so slow with clang, but it really is
> > > painfully slow:
> > >
> > > time make CC=clang allmodconfig
> > > real 0m2.667s
> > >
> > > vs the gcc case:
> > >
> > > time make CC=gcc allmodconfig
> > > real 0m0.903s
> >
> > Can
> > you provide info about your clang build such as the version string,
> > and whether this was built locally perhaps?
>
> d'oh it was below.
>
> > > This is on my F34 machine:
> > >
> > > clang version 12.0.0 (Fedora 12.0.0-0.3.rc1.fc34)

A quick:
$ perf record -e cycles:pp --call-graph lbr make LLVM=1 LLVM_IAS=1
-j72 allmodconfig
$ perf report --no-children --sort=dso,symbol
shows:
2.35% [unknown] [k] 0xffffffffabc00fc7
+ 2.29% libc-2.31.so [.] _int_malloc
1.24% libc-2.31.so [.] _int_free
+ 1.23% ld-2.31.so [.] do_lookup_x
+ 1.14% libc-2.31.so [.] __strlen_avx2
+ 1.06% libc-2.31.so [.] malloc
+ 1.03% clang-13 [.] llvm::StringMapImpl::LookupBucketFor
1.01% libc-2.31.so [.] __memmove_avx_unaligned_erms
+ 0.76% conf [.] yylex
+ 0.68% clang-13 [.] llvm::Instruction::getNumSuccessors
+ 0.63% libbfd-2.35.2-system.so [.] bfd_hash_lookup
+ 0.63% clang-13 [.] llvm::PMDataManager::findAnalysisPass
+ 0.63% ld-2.31.so [.] _dl_lookup_symbol_x
0.62% libc-2.31.so [.] __memcmp_avx2_movbe
0.60% libc-2.31.so [.] __strcmp_avx2
+ 0.56% clang-13 [.] llvm::ValueHandleBase::AddToUseList
+ 0.56% clang-13 [.]
llvm::operator==<llvm::DenseMap<llvm::BasicBlock const*, unsigned int,
llvm::DenseMapInfo<llvm::BasicBlock const*>, llvm::detail::Dense
0.53% clang-13 [.]
llvm::SmallPtrSetImplBase::insert_imp_big

(yes, I know about kptr_restrict)(sorry if there's a better way to
share such perf data; don't you need to share perf.data and the same
binary, IIRC?)

The string map lookups look expected; the compiler flags are one very
large string map; though we've identified previously perhaps hashing
could be sped up.

llvm::Instruction::getNumSuccessors looks unexpectedly like codegen,
but this was a trace of `allmodconfig`; I wouldn't be surprised if
this is LLVM=1 setting HOSTCC=clang; might be good to try to isolate
those out.

Some other questions that came to mind thinking about this overnight:
- is Kbuild/make doing more work than is necessary when building with
clang (beyond perhaps a few more cc-option checks)? I don't think perf
is the right tool for profiling GNU make. V=1 to make hides a lot of
the work macros like cc-option are doing.
- is clang doing more work than necessary for just checking support of
command line flags? Probably. I'm not sure that has been optimized
before, but if we pursue that but the slowdown was more so the
previous point, that would potentially be a waste of time.
--
Thanks,
~Nick Desaulniers