Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic locking insns

From: H. Peter Anvin
Date: Sat Mar 08 2025 - 14:09:04 EST


On 2/28/25 08:48, Dave Hansen wrote:
On 2/28/25 04:35, Uros Bizjak wrote:
The code size of the resulting x86_64 defconfig object file increases
for 33.264 kbytes, representing 1.2% code size increase:

text data bss dec hex filename
27450107 4633332 814148 32897587 1f5fa33 vmlinux-old.o
27483371 4633784 814148 32931303 1f67de7 vmlinux-new.o

So, first of all, thank you for including some objective measurement of
the impact if your patches. It's much appreciated.

But I think the patches need to come with a solid theory of why they're
good. The minimum bar for that, I think, is *some* kind of actual
real-world performance test. I'm not picky. Just *something* that spends
a lot of time in the kernel and ideally where a profile points at some
of the code you're poking here.

I'm seriously not picky: will-it-scale, lmbench, dbench, kernel
compiles. *ANYTHING*. *ANY* hardware. Run it on your laptop.

But performance patches need to come with performance *numbers*.

Incidentally, this is exactly the reason why gcc added "asm inline" *at our request*. We just haven't caught up with it everywhere yet.

In fact, I would wonder if we shouldn't simply do:

#define asm __asm__ __inline__
#define asm_noinline __asm__

... in other words, to make asm inline an opt-out instead of an opt-in.
It is comparatively unusual that we do complex things in inline assembly that we would want gcc to treat as something that should be avoided.

-hpa