Re: [PATCH v2] bitmap: Switch from inline to __always_inline

From: Yury Norov
Date: Fri Jul 12 2024 - 20:19:38 EST


> > I like it more, when it's split onto several patches. We already have
> > my bitmap.h rework and your cpumask.h one with their own reasonable
> > motivations. Can you keep that patch structure please? Then, we can add
>
> I'm not quite sure what you mean. I imitated the patch structure of your
> patch 1 that I linked, which had the following file-diff list:
>
> include/linux/bitmap.h | 46 ++++++-------
> include/linux/cpumask.h | 144 +++++++++++++++++++--------------------
> include/linux/find.h | 40 +++++------
> include/linux/nodemask.h | 86 +++++++++++------------
>
> Are you suggesting I should split that into 2 (one for cpumask and one
> for the rest)?

I think the best way would be make a separate patch for each file:

1. find.h - because all others include it for find_bit() functions.
2. bitmap.h - pick my existing patch.
3. cpumasks.h - pick your existing patch.
4. nodemask.h - just the last piece.

> > separate patches for find.h and nodemask.h. If rebasing the old bitmap.h
> > patch isn't clean and takes some effort, feel free to add your
> > co-developed-by.
>
> I don't really care who gets authorship, but I did manually rewrite it,

I do care about authorship. Not for vanity reasons, but because author
is the first person to blame in case there's a reason to blame.

> since there were enough conflicts, and it's basically scriptable. Feel
> free to suggest which Author/S-o-b/Co-developed combo makes sense for
> that, and I can adopt it in v3 :)

Can you keep my authorship and commit message for bitmap.h please, and
make sure that all the functions are __always_inline now? If you have
to re-write it, Co-developed-by is a proper tag.

> > > According to bloat-o-meter, my vmlinux decreased in size by a total of
> > > 1538 bytes (-0.01%) with this change.
> > >
> > > [0] CONFIG_HOTPLUG_PARALLEL=y ('select'ed for x86 as of [1]) and
> > > CONFIG_GCOV_PROFILE_ALL.
> > >
> > > [1] commit 0c7ffa32dbd6 ("x86/smpboot/64: Implement
> > > arch_cpuhp_init_parallel_bringup() and enable it")
> > >
> > > Cc: Yury Norov <yury.norov@xxxxxxxxx>
> > > Signed-off-by: Brian Norris <briannorris@xxxxxxxxxxxx>
> >
> > So... This whole optimization is only worth the reviewer's time if we
> > can prove it's sustainable across different compilers and configurations.
> >
> > Just saying that under some particular config it works, brings little
> > value for those who are not interested in that config. Moreover, if
> > one enables GCOV, he likely doesn't care much about the .text size.
> > And for those using release configs, it only brings uncertainty.
>
> I think I partially disagree. If it's neutral for one compiler but
> helpful for the other, that's still a win. But otherwise, I agree we
> should have some accounting of diversity -- not just a single test.

Sure, I just want to make sure this wouldn't bloat common sane configs.
+/- 100 bytes is a negligible noise. +2M for defconfig is not.

> > Let's test it broader? We've got 2 main compilers - gcc and llvm, and
> > at least two configurations that may be relevant: defconfig and your
> > HOTPLUG_PARALLEL + GCOV_PROFILE_ALL. And it would be nice to prove
> > that the optimization is sustainable across compiler versions.
> >
> > I have the following table in mind:
> >
> > defconfig your conf
> > GCC 9 | -1900 | |
> > CLANG 13 | | |
> > GCC 13 | | |
> > CLANG 18 | -100 | -1538 |
> >
> > The defconfig nubmers above are from my past experiments. And you can
> > add more configs/compilers, of course.
>
> Perhaps I wasn't too clear, but I was actually testing a few things:
>
> * my ChromeOS-y build, with gcov and clang -- to ensure the section
> mismatch warning/error disappears
> * a pure mainline / semi-defaults build (which happened to use my host
> distro's gcc 13), to do size comparisons. I also happened to enable
> gcov in the currently-posted experiments.
>
> But agreed, I can get a larger matrix with a bit more specifity in my
> commit log. TBD on the ease of getting a Clang 13 or GCC 9 toolchain,
> but I think I can convince my distro to provide them.

It looks like minimum Clang version is 14. The rest is good.

Thanks,
Yury

> > > --- a/include/linux/bitmap.h
> > > +++ b/include/linux/bitmap.h
> > > @@ -203,7 +203,7 @@ unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
> > > * the bit offset of all zero areas this function finds is multiples of that
> > > * power of 2. A @align_mask of 0 means no alignment is required.
> > > */
> > > -static inline unsigned long
> > > +static __always_inline unsigned long
> > > bitmap_find_next_zero_area(unsigned long *map,
> > > unsigned long size,
> > > unsigned long start,
> >
> > Let's split them such that return type would be at the same line as
> > the function name. It would help to distinguish function declaration
> > from invocations in grep output:
>
> Ack. I thought I was being consistent with existing cpumask.h style, but
> I got this set wrong. Will fix.
>
> Brian