Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic locking insns
From: Uros Bizjak
Date: Thu Mar 06 2025 - 08:56:48 EST
On Thu, Mar 6, 2025 at 10:57 AM Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
>
> * Uros Bizjak <ubizjak@xxxxxxxxx> wrote:
>
> > According to:
> >
> > https://gcc.gnu.org/onlinedocs/gcc/Size-of-an-asm.html
> >
> > the usage of asm pseudo directives in the asm template can confuse
> > the compiler to wrongly estimate the size of the generated
> > code.
> >
> > The LOCK_PREFIX macro expands to several asm pseudo directives, so
> > its usage in atomic locking insns causes instruction length estimate
> > to fail significantly (the specially instrumented compiler reports
> > the estimated length of these asm templates to be 6 instructions long).
> >
> > This incorrect estimate further causes unoptimal inlining decisions,
> > unoptimal instruction scheduling and unoptimal code block alignments
> > for functions that use these locking primitives.
> >
> > Use asm_inline instead:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2018-December/512349.html
> >
> > which is a feature that makes GCC pretend some inline assembler code
> > is tiny (while it would think it is huge), instead of just asm.
> >
> > For code size estimation, the size of the asm is then taken as
> > the minimum size of one instruction, ignoring how many instructions
> > compiler thinks it is.
> >
> > The code size of the resulting x86_64 defconfig object file increases
> > for 33.264 kbytes, representing 1.2% code size increase:
> >
> > text data bss dec hex filename
> > 27450107 4633332 814148 32897587 1f5fa33 vmlinux-old.o
> > 27483371 4633784 814148 32931303 1f67de7 vmlinux-new.o
> >
> > mainly due to different inlining decisions of -O2 build.
>
> So my request here would be not more benchmark figures (I don't think
> it's a realistic expectation for contributors to be able to measure
> much of an effect with such a type of change, let alone be certain
> what a macro or micro-benchmark measures is causally connected with
> the patch), but I'd like to ask for some qualitative analysis on the
> code generation side:
>
> - +1.2% code size increase is a lot, especially if it's under the
> default build flags of the kernel. Where does the extra code come
> from?
>
> - Is there any effect on Clang? Are its inlining decisions around
> these asm() statements comparable, worse/better?
FTR, clang recognizes "asm inline", but there was no difference in code sizes:
text data bss dec hex filename
27577163 4503078 807732 32887973 1f5d4a5 vmlinux-clang-patched.o
27577181 4503078 807732 32887991 1f5d4b7
vmlinux-clang-unpatched.o
Uros.