Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic locking insns
From: Uros Bizjak
Date: Sun Mar 09 2025 - 03:50:24 EST
On Sat, Mar 8, 2025 at 8:08 PM H. Peter Anvin <hpa@xxxxxxxxx> wrote:
>
> On 2/28/25 08:48, Dave Hansen wrote:
> > On 2/28/25 04:35, Uros Bizjak wrote:
> >> The code size of the resulting x86_64 defconfig object file increases
> >> for 33.264 kbytes, representing 1.2% code size increase:
> >>
> >> text data bss dec hex filename
> >> 27450107 4633332 814148 32897587 1f5fa33 vmlinux-old.o
> >> 27483371 4633784 814148 32931303 1f67de7 vmlinux-new.o
> >
> > So, first of all, thank you for including some objective measurement of
> > the impact if your patches. It's much appreciated.
> >
> > But I think the patches need to come with a solid theory of why they're
> > good. The minimum bar for that, I think, is *some* kind of actual
> > real-world performance test. I'm not picky. Just *something* that spends
> > a lot of time in the kernel and ideally where a profile points at some
> > of the code you're poking here.
> >
> > I'm seriously not picky: will-it-scale, lmbench, dbench, kernel
> > compiles. *ANYTHING*. *ANY* hardware. Run it on your laptop.
> >
> > But performance patches need to come with performance *numbers*.
>
> Incidentally, this is exactly the reason why gcc added "asm inline" *at
> our request*. We just haven't caught up with it everywhere yet.
>
> In fact, I would wonder if we shouldn't simply do:
>
> #define asm __asm__ __inline__
> #define asm_noinline __asm__
>
> ... in other words, to make asm inline an opt-out instead of an opt-in.
> It is comparatively unusual that we do complex things in inline assembly
> that we would want gcc to treat as something that should be avoided.
I don't think we need such radical changes. There are only a few
groups of instructions, nicely hidden behind macros, that need asm
noinline. Alternatives (gcc counted them as 20 - 23 instructions) are
already using asm inline (please see
arch/x86/include/asm/alternative.h) in their high-level macros, and my
proposed patch converts all asm using LOCK_PREFIX by amending macros
in 7 header files.
Uros.