[PATCH] x86/processor.h: Force inlining of cpu_relax()

From: Denys Vlasenko
Date: Thu Sep 24 2015 - 08:02:42 EST

On x86, cpu_relax() simply calls rep_nop(), which generates one
instruction, PAUSE (aka REP NOP).

With this config:
gcc-4.7.2 does not always inline rep_nop(): it generates
several copies of this:

<rep_nop> (16 copies, 194 calls):
55 push %rbp
48 89 e5 mov %rsp,%rbp
f3 90 pause
5d pop %rbp
c3 retq

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

This patch fixes this via s/inline/__always_inline/
on rep_nop() and cpu_relax().
(Forcing inlining only on rep_nop() causes gcc to
deinline cpu_relax(), with almost no change in generated code).

text data bss dec hex filename
88118971 19905208 36421632 144445811 89c1173 vmlinux.before
88118139 19905208 36421632 144444979 89c0e33 vmlinux

Signed-off-by: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
CC: Ingo Molnar <mingo@xxxxxxxxxx>
CC: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
CC: "H. Peter Anvin" <hpa@xxxxxxxxx>
CC: Borislav Petkov <bp@xxxxxxxxx>
CC: Brian Gerst <brgerst@xxxxxxxxx>
CC: x86@xxxxxxxxxx
CC: linux-kernel@xxxxxxxxxxxxxxx
arch/x86/include/asm/processor.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 19577dd..b55f309 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -556,12 +556,12 @@ static inline unsigned int cpuid_edx(unsigned int op)

/* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
-static inline void rep_nop(void)
+static __always_inline void rep_nop(void)
asm volatile("rep; nop" ::: "memory");

-static inline void cpu_relax(void)
+static __always_inline void cpu_relax(void)

