Re: [PATCH 5/7] kbuild: get rid of duplication in *.mod files

From: Masahiro Yamada
Date: Wed Apr 13 2022 - 04:21:02 EST


On Sat, Apr 9, 2022 at 5:43 AM Nick Desaulniers <ndesaulniers@xxxxxxxxxx> wrote:
>
> On Thu, Apr 7, 2022 at 5:08 PM Masahiro Yamada <masahiroy@xxxxxxxxxx> wrote:
> >
> > On Fri, Apr 8, 2022 at 2:55 AM Nick Desaulniers <ndesaulniers@xxxxxxxxxx> wrote:
> > >
> > > On Wed, Apr 6, 2022 at 8:31 AM Masahiro Yamada <masahiroy@xxxxxxxxxx> wrote:
> > > >
> > > > diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> > > > index 6ae92d119dfa..f7a30f378e20 100644
> > > > --- a/scripts/Makefile.build
> > > > +++ b/scripts/Makefile.build
> > > > @@ -303,7 +303,8 @@ $(obj)/%.prelink.o: $(obj)/%.o FORCE
> > > > $(call if_changed,cc_prelink_modules)
> > > > endif
> > > >
> > > > -cmd_mod = echo $(addprefix $(obj)/, $(call real-search, $*.o, .o, -objs -y -m)) > $@
> > > > +cmd_mod = echo $(addprefix $(obj)/, $(call real-search, $*.o, .o, -objs -y -m)) | \
> > > > + $(AWK) -v RS='( |\n)' '!x[$$0]++' > $@
> > >
> > > God AWK is unreadable. Any reason we can't use GNU make's sort builtin?
> > > https://www.gnu.org/software/make/manual/html_node/Text-Functions.html
> >
> >
> > I did that in the previous submission.
> > https://lore.kernel.org/lkml/20220405113359.2880241-8-masahiroy@xxxxxxxxxx/
> >
> >
> > After some thoughts, I decided to drop duplicates without sorting.
> >
> > If I alphabetically sorted the object list,
> > 7/7 of this series would be impossible.
> >
> >
> > I am not a big fan of AWK, but I do not know a cleaner way.
> > If you know a better idea, please tell me.
>
> ```
> # stable_dedup.py
> from sys import argv
>
> wordset = set()
> argv.pop(0)
> for word in argv: wordset.add(word)
> for word in wordset: print(word)
> ```
> If that ever shows up in a profile of a kernel build, <set> in C++
> looks pretty similar. Then that script can be reused in a couple of
> other places, and has a more descriptive name that hints at what it
> does.
>
> Compare that with `$(AWK) -v RS='( |\n)' '!x[$$0]++'`.


As I said, I want to drop duplicates without changing the argument order.

Your python code shuffles the order since it adds arguments to set() first.


$ cat stable_dedup.py
#!/usr/bin/python3
from sys import argv
wordset = set()
argv.pop(0)
for word in argv: wordset.add(word)
for word in wordset: print(word)

$ ./stable_dedup.py c b a a b
c
a
b



Here, the output I expect is "c b a".


If I am allowed to change the order, I would use
Make's $(sort ...) function or "sort -u" shell command.




Of course, it is pretty easy to write a python script
that dedups arguments without changing the order.


$ cat dedup-by-python
#!/usr/bin/python3
import sys
wordset = set()

for x in sys.argv[1:]:
if x not in wordset:
print(x)
wordset.add(x)

$ ./dedup-by-python c b a a b
c
b
a


Even this script looks like a bad approach.


Please note cmd_mod is invoked as many times
as the number of modules.
So, this happens many times, especially for allmodconfig.


Python takes a lot of overhead times for initialization.


AWK implementation is much faster.
It is apparent from perf.




[1] AWK implementation

$ cat test-data.txt
c b a a b

$ cat dedup-by-awk
#!/usr/bin/awk -f
BEGIN { RS="( |\n)" }
!x[$0]++ { print($0) }


# perf stat -- ./dedup-by-awk < test-data.txt
c
b
a

Performance counter stats for './dedup-by-awk':

1.06 msec task-clock # 0.790 CPUs
utilized
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
201 page-faults # 189.755 K/sec
3,671,995 cycles # 3.467 GHz
3,932,770 instructions # 1.07 insn per
cycle
754,811 branches # 712.582 M/sec
21,154 branch-misses # 2.80% of all
branches
18,350,660 slots # 17.324 G/sec
4,173,875 topdown-retiring # 22.7% retiring
2,230,864 topdown-bad-spec # 12.2% bad
speculation
5,757,069 topdown-fe-bound # 31.4% frontend
bound
6,188,850 topdown-be-bound # 33.7% backend
bound

0.001341605 seconds time elapsed

0.001476000 seconds user
0.000000000 seconds sys



[2] Python implementation

# perf stat -- ./dedup-by-python c b a a b
c
b
a

Performance counter stats for './dedup-by-python c b a a b':

9.34 msec task-clock # 0.967 CPUs
utilized
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
756 page-faults # 80.947 K/sec
31,045,653 cycles # 3.324 GHz
39,175,531 instructions # 1.26 insn per
cycle
8,488,886 branches # 908.929 M/sec
326,947 branch-misses # 3.85% of all
branches
152,587,445 slots # 16.338 G/sec
37,698,074 topdown-retiring # 24.7% retiring
32,911,017 topdown-bad-spec # 21.6% bad
speculation
55,051,156 topdown-fe-bound # 36.1% frontend
bound
26,927,196 topdown-be-bound # 17.6% backend
bound

0.009661105 seconds time elapsed

0.006485000 seconds user
0.003242000 seconds sys





> --
> Thanks,
> ~Nick Desaulniers



--
Best Regards
Masahiro Yamada