Re: More annoying code generation by clang

From: Linus Torvalds
Date: Mon Apr 08 2024 - 15:43:00 EST


On Mon, 8 Apr 2024 at 11:32, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> It's been reported long ago, it seems to be hard to fix.
>
> I suspect the issue is that the inline asm format is fairly closely
> related to the gcc machine descriptions (look at the machine
> descriptor files in gcc, and if you can ignore the horrid LISP-style
> syntax you see how close they are).

Actually, one of the github issues pages has more of an explanation
(and yes, it's tied to impedance issues between the inline asm syntax
and how clang works):

https://github.com/llvm/llvm-project/issues/20571#issuecomment-980933442

so I wrote more of a commit log and did that "ASM_SOURCE_G" thing
(except I decided to call it "input" instead of "source", since that's
the standard inline asm language).

This version also has that output size fixed, and the commit message
talks about it.

This does *not* fix other inline asms to use "ASM_INPUT_G/RM".

I think it's mainly some of the bitop code that people have noticed
before - fls and variable_ffs() and friends.

I suspect clang is more common in the arm64 world than it is for
x86-64 kernel developers, and arm64 inline asm basically never uses
"rm" or "g" since arm64 doesn't have instructions that take either a
register or a memory operand.

Anyway, with gcc this generates

cmp (%rdx),%ebx; sbb %rax,%rax # _7->max_fds, fd, __mask

IOW, it uses the memory location for "max_fds". It couldn't do that
before, because it used to think that it always had to do the compare
in 64 bits, and the memory location is only 32-bit.

With clang, this generates

movl (%rcx), %eax
cmpl %eax, %edi
sbbq %rdi, %rdi

which has that extra register use, but is at least much better than
what it used to generate with crazy "load into register, spill to
stack, then compare against stack contents".

Linus
From 7779d285040bab685296da2cd0afe9d2d7b58969 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Mon, 8 Apr 2024 11:38:30 -0700
Subject: [PATCH] x86: improve array_index_mask_nospec() code generation

Don't force the inputs to be 'unsigned long', when the comparison can
easily be done in 32-bit if that's more appropriate.

Note that while we can look at the inputs to choose an appropriate size
for the compare instruction, the output is fixed at 'unsigned long'.
That's not technically optimal either, since a 32-bit 'sbbl' would often
be sufficient.

But for the outgoing mask we don't know how the mask ends up being used
(ie we have uses thathave an incoming 32-bit array index, but end up
using the mask for other things). That said, it only costs the extra
REX prefix to always generate the 64-bit mask.

[ A 'sbbl' also always technically generates a 64-bit mask, but with the
upper 32 bits clear: that's fine for when the incoming index that will
be masked is already 32-bit, but not if you use the mask to mask a
pointer afterwards, like the file table lookup does ]

Also, work around clang problems with asm constraints that have multiple
possibilities, particularly "g" and "rm". Clang seems to turn inputs
like that into the most generic form, which is the memory input - but to
make matters worse, clang won't even use a possible original memory
location, but will spill the value to stack, and use the stack for the
asm input.

See

https://github.com/llvm/llvm-project/issues/20571#issuecomment-980933442

for some explanation of why clang has this strange behavior, but the end
result is that "g" and "rm" really end up generating horrid code.

Link: https://github.com/llvm/llvm-project/issues/20571
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: H. Peter Anvin <hpa@xxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
---
arch/x86/include/asm/barrier.h | 24 ++++++++++--------------
include/linux/compiler-clang.h | 12 ++++++++++++
include/linux/compiler_types.h | 9 +++++++++
3 files changed, 31 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 66e57c010392..234fd892e39e 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -33,20 +33,16 @@
* Returns:
* 0 - (index < size)
*/
-static __always_inline unsigned long array_index_mask_nospec(unsigned long index,
- unsigned long size)
-{
- unsigned long mask;
-
- asm volatile ("cmp %1,%2; sbb %0,%0;"
- :"=r" (mask)
- :"g"(size),"r" (index)
- :"cc");
- return mask;
-}
-
-/* Override the default implementation from linux/nospec.h. */
-#define array_index_mask_nospec array_index_mask_nospec
+#define array_index_mask_nospec(idx,sz) ({ \
+ typeof((idx)+(sz)) __idx = (idx); \
+ typeof(__idx) __sz = (sz); \
+ unsigned long __mask; \
+ asm volatile ("cmp %1,%2; sbb %0,%0" \
+ :"=r" (__mask) \
+ :ASM_INPUT_G (__sz), \
+ "r" (__idx) \
+ :"cc"); \
+ __mask; })

/* Prevent speculative execution past this barrier. */
#define barrier_nospec() asm volatile("lfence":::"memory")
diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index 49feac0162a5..0dee061fd7a6 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -118,3 +118,15 @@

#define __diag_ignore_all(option, comment) \
__diag_clang(13, ignore, option)
+
+/*
+ * clang has horrible behavior with "g" or "rm" constraints for asm
+ * inputs, turning them into something worse than "m". Avoid using
+ * constraints with multiple possible uses (but "ir" seems to be ok):
+ *
+ * https://github.com/llvm/llvm-project/issues/20571
+ * https://github.com/llvm/llvm-project/issues/30873
+ * https://github.com/llvm/llvm-project/issues/34837
+ */
+#define ASM_INPUT_G "ir"
+#define ASM_INPUT_RM "r"
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 2abaa3a825a9..e53acd310545 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -380,6 +380,15 @@ struct ftrace_likely_data {
#define asm_goto_output(x...) asm volatile goto(x)
#endif

+/*
+ * Clang has trouble with constraints with multiple
+ * alternative behaviors (mainly "g" and "rm").
+ */
+#ifndef ASM_INPUT_G
+ #define ASM_INPUT_G "g"
+ #define ASM_INPUT_RM "rm"
+#endif
+
#ifdef CONFIG_CC_HAS_ASM_INLINE
#define asm_inline asm __inline
#else
--
2.44.0.330.g4d18c88175