Re: [PATCH RFC 3/5] noinstr: Use asm_inline() in instrumentation_{begin,end}()

From: Josh Poimboeuf
Date: Tue Apr 08 2025 - 12:47:14 EST

Next message: Nhat Pham: "Re: [RFC PATCH 00/14] Virtual Swap Space"
Previous message: Suren Baghdasaryan: "Re: [RFC PATCH 2/2] MAINTAINERS: add MM subsection for the page allocator"
In reply to: Uros Bizjak: "Re: [PATCH RFC 3/5] noinstr: Use asm_inline() in instrumentation_{begin,end}()"
Next in thread: Josh Poimboeuf: "[PATCH RFC 4/5] x86/alternative: Improve code generation readability"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Apr 08, 2025 at 01:10:14PM +0200, Uros Bizjak wrote:
> On Tue, Apr 8, 2025 at 10:30 AM Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> > * Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> >
> > > Use asm_inline() to prevent the compiler from making poor inlining
> > > decisions based on the length of the objtool annotations.
> > >
> > > For a defconfig kernel built with GCC 14.2.1, bloat-o-meter reports a
> > > 0.18% text size increase:
> > >
> > > add/remove: 88/433 grow/shrink: 967/487 up/down: 87579/-52630 (34949)
> > > Total: Before=19448407, After=19483356, chg +0.18%
> > >
> > > Presumably the text growth is due to increased inlining. A net total of
> > > 345 functions were removed.
> >
> > Since +0.18% puts this into the 'significant' category of .text size
> > increases, it would be nice to see a bit more details about the nature
> > of these function calls removed: were they really explicit calls to
> > __instrumentation_begin()/end(), or somehow tail-call optimized out, or
> > something else?

The instrumentation macros are used by WARN*() and lockdep, so this can
affect a lot of functions.

BTW, without the objtool annotations, each of those macros resolves to a
single NOP. So using inline_asm() seems obviously correct here as it
accurately communicates the code size to the compiler. I'm not sure if
that was clear from the description.

> > Also, I'm wondering where the 34,949 bytes bloat comes from: with 345
> > functions removed that's 100 bytes per function? Doesn't sound right.
>
> Please note that removed functions can be inlined at several places. E.g.:
>
> $ grep "<encode_string>" objdump.old
>
> 00000000004506e0 <encode_string>:
> 45113c: e8 9f f5 ff ff call 4506e0 <encode_string>
> 452bcb: e9 10 db ff ff jmp 4506e0 <encode_string>
> 453d33: e8 a8 c9 ff ff call 4506e0 <encode_string>
> 453ef7: e8 e4 c7 ff ff call 4506e0 <encode_string>
> 45549f: e8 3c b2 ff ff call 4506e0 <encode_string>
> 455843: e8 98 ae ff ff call 4506e0 <encode_string>
> 455b37: e8 a4 ab ff ff call 4506e0 <encode_string>
> 455b47: e8 94 ab ff ff call 4506e0 <encode_string>
> 4564fa: e8 e1 a1 ff ff call 4506e0 <encode_string>
> 456669: e8 72 a0 ff ff call 4506e0 <encode_string>
> 456691: e8 4a a0 ff ff call 4506e0 <encode_string>
> 4566a0: e8 3b a0 ff ff call 4506e0 <encode_string>
> 4569aa: e8 31 9d ff ff call 4506e0 <encode_string>
> 456e79: e9 62 98 ff ff jmp 4506e0 <encode_string>
> 456efe: e9 dd 97 ff ff jmp 4506e0 <encode_string>
>
> all these calls now inline:
>
> encode_string 58 - -58
>
> where for example encode_putfh() grows by:
>
> encode_putfh 70 118 +48

Thanks for looking! That makes sense: encode_string() uses
WARN_ON_ONCE() which uses instrumentation_begin()/end().
encode_string() is getting inlined now that GCC has more accurate
information about its size.

> > Also, is the bloat-o-meter output limited to the .text section, or does
> > it include growth in out-of-line sections too?
>
> bloat-o-meter by default looks at all sybol types ("tTdDbBrRvVwW" as
> returned by nm). Adding "-c" option will categorize the output by
> symbol type (this is x86_64 defconfig change with gcc-4.2.1):
>
> add/remove: 72/350 grow/shrink: 918/348 up/down: 80532/-46857 (33675)
> Function old new delta
> ...
> Total: Before=16685068, After=16718743, chg +0.20%
> add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
> Data old new delta
> Total: Before=4471889, After=4471889, chg +0.00%
> add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-12 (-12)
> RO Data old new delta
> icl_voltage_level_max_cdclk 12 - -12
> Total: Before=1783310, After=1783298, chg -0.00%

Right. That means that bloat-o-meter's results include any
text.unlikely growth/shrinkage, because that code is contained by
symbols (typically .cold subfunctons).

I suppose it would be more helpful if .text.unlikely were excluded or
separated out. .text.unlikely code growth is always a good thing, as
opposed to .text for which growth can be good or bad.

--
Josh

Next message: Nhat Pham: "Re: [RFC PATCH 00/14] Virtual Swap Space"
Previous message: Suren Baghdasaryan: "Re: [RFC PATCH 2/2] MAINTAINERS: add MM subsection for the page allocator"
In reply to: Uros Bizjak: "Re: [PATCH RFC 3/5] noinstr: Use asm_inline() in instrumentation_{begin,end}()"
Next in thread: Josh Poimboeuf: "[PATCH RFC 4/5] x86/alternative: Improve code generation readability"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]