Re: [PATCH] cpumask: Switch from inline to __always_inline

From: Yury Norov
Date: Mon Jul 08 2024 - 16:13:21 EST


On Mon, Jul 08, 2024 at 12:41:25PM -0700, Brian Norris wrote:
> Hi Yury, Nathan,
>
> On Wed, Jul 03, 2024 at 12:57:24PM -0700, Nathan Chancellor wrote:
> > On Wed, Jul 03, 2024 at 12:06:36PM -0700, Yury Norov wrote:
> > > On Tue, Jun 25, 2024 at 11:27:59AM -0700, Brian Norris wrote:
> > > > On Tue, May 14, 2024 at 01:49:01PM -0700, Brian Norris wrote:
> > > > > This change (plus more) has been previously proposed for other reasons
> > > > > -- that some of the bitmask 'const' machinery doesn't work without
> > > > > inlining -- in the past as:
> > > > >
> > > > > Subject: [PATCH 1/3] bitmap: switch from inline to __always_inline
> > > > > https://lore.kernel.org/all/20221027043810.350460-2-yury.norov@xxxxxxxxx/
> > > > >
> > > > > It seems like a good idea to at least make all cpumask functions use
> > > > > __always_inline; several already do.
> >
> > > I feel that if we decide making cpumask an __always_inline is the
> > > right way, we also should make underlying bitmap API __always_inline
> > > just as well. Otherwise, there will be a chance of having outlined
> > > bitmap helpers, which may confuse clang again.
> >
> > If this does not result in noticeable bloat, this may not be a bad
> > idea. I seem to recall this being an issue in the past for us but I
> > cannot seem to find the issue at this point. Commit 1dc01abad654
> > ("cpumask: Always inline helpers which use bit manipulation functions")
> > comes to mind.
>
> In the above quote, I already referenced Yury's previous post to do just
> that (__always_inline for all of bitmask and cpumask). I don't know why
> that wasn't ever merged, so I instead chose a smaller set that resolved
> my current problems.

Hi Brian,

I felt like your observed growth of the .text is caused by inlining
only part of bitmap-related functions, and if we do inline all of
them that might help.

I ran my own builds against this __always_inline thing for all bitmap
functions and their wrappers, namely those located in:
- bitmap.h
- cpumask.h
- find.h
- nodemask.h

When all 'inline's are replaced with '__always_inline', I found that
defconfig build saves ~1800 bytes with GCC9, and 100 bytes with
clang 18:
add/remove: 0/8 grow/shrink: 18/6 up/down: 253/-353 (-100)

(I didn't test the build against a fresher GCC and older clang, and
likely will not do that till the next weekend.)

>From my past experience, newer versions of compilers tend to inline
more aggressively, and thus generate bigger binaries. In case of
bitmaps and friends, however, we should always inline because this
inline 'small_const_nbits()' part is always resolved at compile time.
Thus, aggressive inlining is always a win.

> I can dust that off, rebase it, and give it a bloat check if that's
> preferable though.

If you want to take over this work - please go ahead. To make it
complete, we basically need to make sure that all bitmap APIs are
inlined, and check that the build doesn't grow for fresh and older
compilers - both clang and gcc.

Thanks,
Yury