Re: [PATCH] compiler_types: Introduce inline_for_performance

From: David Laight

Date: Mon Jan 19 2026 - 05:50:21 EST


On Mon, 19 Jan 2026 11:25:52 +0100
Eric Dumazet <edumazet@xxxxxxxxxx> wrote:

> On Mon, Jan 19, 2026 at 10:33 AM David Laight
> <david.laight.linux@xxxxxxxxx> wrote:
> >
> > On Sun, 18 Jan 2026 16:01:25 -0800
> > Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > > On Sun, 18 Jan 2026 22:58:02 +0000 David Laight <david.laight.linux@xxxxxxxxx> wrote:
> > >
> > > > > mm/ alone has 74 __always_inlines, none are documented, I don't know
> > > > > why they're present, many are probably wrong.
> > > > >
> > > > > Shit, uninlining only __get_user_pages_locked does this:
> > > > >
> > > > > text data bss dec hex filename
> > > > > 115703 14018 64 129785 1faf9 mm/gup.o
> > > > > 103866 13058 64 116988 1c8fc mm/gup.o-after
> > > >
> > > > The next questions are does anything actually run faster (either way),
> > > > and should anything at all be marked 'inline' rather than 'always_inline'.
> > > >
> > > > After all, if you call a function twice (not in a loop) you may
> > > > want a real function in order to avoid I-cache misses.
> > >
> > > yup
> >
> > I had two adjacent strlen() calls in a bit of code, the first was an
> > array (in a structure) and gcc inlined the 'word at a time' code, the
> > second was a pointer and it called the library function.
> > That had to be sub-optimal...
> >
> > > > But I'm sure there is a lot of code that is 'inline_for_bloat' :-)
> > >
> > > ooh, can we please have that?
> >
> > Or 'inline_to_speed_up_benchmark' and the associated 'unroll this loop
> > because that must make it faster'.
> >
> > > I do think that every always_inline should be justified and commented,
> > > but I haven't been energetic about asking for that.
> >
> > Apart from the 4-line functions where it is clearly obvious.
> > Especially since the compiler can still decide to not-inline them
> > if they are only 'inline'.
> >
> > > A fun little project would be go through each one, figure out whether
> > > were good reasons and if not, just remove them and see if anyone
> > > explains why that was incorrect.
> >
> > It's not just always_inline, a lot of the inline are dubious.
> > Probably why the networking code doesn't like it.
>
> Many __always_inline came because of clang's reluctance to inline
> small things, even if the resulting code size is bigger and slower.
>
> It is a bit unclear, this seems to happen when callers are 'big
> enough'. noinstr (callers) functions are also a problem.
>
> Let's take the list_add() call from dev_gro_receive() : clang does not
> inline it, for some reason.
>
> After adding __always_inline to list_add() and __list_add() we have
> smaller and more efficient code,
> for real workloads, not only benchmarks.

That falls into the '4-line function' category.
Where s/inline/always_inline/ makes sense.

> list_add 2212 - -2212

How many copies of list_add() is that... clearly a few.
Generating a real function for a 'static inline' in a header is stupid.
Pretty much the intent for those is to get them inlined.

I'm sure there was a suggestion to make inline mean 'always inline',
except there are places where it would just be bloat.

David