Re: [PATCH] x86: Implement _THIS_IP_ using inline asm for 32-bit
From: Peter Zijlstra
Date: Thu May 21 2026 - 06:46:58 EST
On Thu, May 21, 2026 at 02:55:22AM -0700, H. Peter Anvin wrote:
> On May 21, 2026 12:08:01 AM PDT, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >On Thu, May 21, 2026 at 02:00:09AM +0200, Marco Elver wrote:
> >> Both GCC [1] and Clang [2] consider the generic version of _THIS_IP_ to
> >> be broken:
> >>
> >> #define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; })
> >>
> >> In particular, the address of a label is only expected to be used with a
> >> computed goto.
> >>
> >> While the generic version more or less works today, it is known to be
> >> brittle and may break with current and future optimizations. For
> >> example, Clang -O2 always returns 1 when this function is inlined:
> >>
> >> static inline unsigned long get_ip(void)
> >> { return ({ __label__ __here; __here: (unsigned long)&&__here; }); }
> >>
> >
> >Oh gawd :/
> >
> >> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120071 [1]
> >> Link: https://github.com/llvm/llvm-project/issues/138272 [2]
> >> Signed-off-by: Marco Elver <elver@xxxxxxxxxx>
> >> ---
> >> arch/x86/include/asm/linkage.h | 3 ++-
> >> 1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/x86/include/asm/linkage.h b/arch/x86/include/asm/linkage.h
> >> index a7294656ad90..bce3c6f4b94f 100644
> >> --- a/arch/x86/include/asm/linkage.h
> >> +++ b/arch/x86/include/asm/linkage.h
> >> @@ -13,11 +13,12 @@
> >> * The generic version tends to create spurious ENDBR instructions under
> >> * certain conditions.
> >> */
> >> -#define _THIS_IP_ ({ unsigned long __here; asm ("lea 0(%%rip), %0" : "=r" (__here)); __here; })
> >> +#define _THIS_IP_ ({ unsigned long __here; asm volatile("lea 0(%%rip), %0" : "=r" (__here)); __here; })
> >> #endif
> >>
> >> #ifdef CONFIG_X86_32
> >> #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))
> >> +#define _THIS_IP_ ({ unsigned long __ip; asm volatile("call 1f\n1: pop %0" : "=r" (__ip)); __ip; })
> >
> >This will mess up the RSB and cause bad performance ripple effects for a
> >bit each use. Now, I don't think anybody still cares about performance
> >on 32bit (I certainly don't), so perhaps this is fine. But urgh.
>
> Most microarchitectures do *not* have a problem with call/pop, as they
> know that call with a zero offset is not going to return. The main
> exception was the Pentium 4.
Oh, that's good to know. Still the "1: mov $1b, %reg" thing is shorter,
and generates the exact same code the compilers used to (and GCC still
does). Isn't that a better option?