More annoying code generation by clang

From: Linus Torvalds
Date: Thu Apr 04 2024 - 18:54:27 EST


So this doesn't really matter in any real life situation, but it
really grated on me.

Clang has this nasty habit of taking our nice asm constraints, and
turning them into worst-case garbage. It's been reported a couple of
times where we use "g" to tell the compiler that pretty much any
source to the asm works, and then clang takes that to mean "I will
take that to use 'memory'" even when that makes no sense what-so-ever.

See for example

https://lore.kernel.org/all/CAHk-=wgobnShg4c2yyMbk2p=U-wmnOmX_0=b3ZY_479Jjey2xw@xxxxxxxxxxxxxx/

where I was ranting about clang just doing pointlessly stupid things.

However, I found a case where yes, clang does pointlessly stupid
things, but it's at least _partly_ our fault, and gcc can't generate
optimal code either.

We have this fairly critical code in __fget_files_rcu() to look up a
'struct file *' from an fd, and it does this:

/* Mask is a 0 for invalid fd's, ~0 for valid ones */
nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);

and clang makes a *horrid* mess of it, generating this code:

movl %edi, %r14d
movq 32(%rbx), %rdx
movl (%rdx), %eax
movq %rax, 8(%rsp)
cmpq 8(%rsp), %r14
sbbq %rcx, %rcx

which is just crazy. Notice how it does that "move rax to stack, then
do the compare against the stack", instead of just using %rax.

In fact, that function shouldn't have a stack frame at all, and the
only reason it is generated is because of this whole oddity.

All clang's fault, right?

Yeah, mostly. But it turns out that what really messes with clangs
little head is that the x86 array_index_mask_nospec() function is
being a bit annoying.

This is what we do:

static __always_inline unsigned long
array_index_mask_nospec(unsigned long index,
unsigned long size)
{
unsigned long mask;

asm volatile ("cmp %1,%2; sbb %0,%0;"
:"=r" (mask)
:"g"(size),"r" (index)
:"cc");
return mask;
}

and look at the use again:

nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);

here all the values are actually 'unsigned int'. So what happens is
that clang can't just use the fdt->max_fds value *directly* from
memory, because it needs to be expanded from 32-bit to 64-bit because
we've made our array_index_mask_nospec() function only work on 64-bit
'unsigned long' values.

So it turns out that by massaging this a bit, and making it just be a
macro - so that the asm can decide that "I can do this in 32-bit" - I
can get clang to generate much better code.

Clang still absolutely hates the "g" constraint, so to get clang to
really get this right I have to use "ir" instead of "g". Which is
wrong. Because gcc does this right, and could use the memory op
directly. But even gcc cannot do that with our *current* function,
because of that "the memory value is 32-bit, we require a 64-bit
value"

Anyway, I can get gcc to generate the right code:

movq 32(%r13), %rdx
cmp (%rdx),%ebx
sbb %esi,%esi

which is basically the right code for the six crazy instructions clang
generates. And if I make the "g" be "ir", I can get clang to generate

movq 32(%rdi), %rcx
movl (%rcx), %eax
cmpl %eax, %esi
sbbl %esi, %esi

which is the same thing, but with that (pointless) load to a register.

And now clang doesn't generate that stack frame at all.

Anyway, this was a long email to explain the odd attached patch.

Comments? Note that this patch is *entirely* untested, I have done
this purely by looking at the code generation in fs/file.c.

Linus
arch/x86/include/asm/barrier.h | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 66e57c010392..6159d2cbbfde 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -33,20 +33,15 @@
* Returns:
* 0 - (index < size)
*/
-static __always_inline unsigned long array_index_mask_nospec(unsigned long index,
- unsigned long size)
-{
- unsigned long mask;
-
- asm volatile ("cmp %1,%2; sbb %0,%0;"
- :"=r" (mask)
- :"g"(size),"r" (index)
- :"cc");
- return mask;
-}
-
-/* Override the default implementation from linux/nospec.h. */
-#define array_index_mask_nospec array_index_mask_nospec
+#define array_index_mask_nospec(idx,sz) ({ \
+ typeof((idx)+(sz)) __idx = (idx); \
+ typeof(__idx) __sz = (sz); \
+ typeof(__idx) __mask; \
+ asm volatile ("cmp %1,%2; sbb %0,%0" \
+ :"=r" (__mask) \
+ :"ir"(__sz),"r" (__idx) \
+ :"cc"); \
+ __mask; })

/* Prevent speculative execution past this barrier. */
#define barrier_nospec() asm volatile("lfence":::"memory")