Re: More annoying code generation by clang

From: Ingo Molnar
Date: Sat Apr 06 2024 - 06:56:39 EST



[ I've Cc:-ed a few more people who might be interested in this. ]
[ Copy of Linus's email below. ]

* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> So this doesn't really matter in any real life situation, but it
> really grated on me.
>
> Clang has this nasty habit of taking our nice asm constraints, and
> turning them into worst-case garbage. It's been reported a couple of
> times where we use "g" to tell the compiler that pretty much any
> source to the asm works, and then clang takes that to mean "I will
> take that to use 'memory'" even when that makes no sense what-so-ever.
>
> See for example
>
> https://lore.kernel.org/all/CAHk-=wgobnShg4c2yyMbk2p=U-wmnOmX_0=b3ZY_479Jjey2xw@xxxxxxxxxxxxxx/
>
> where I was ranting about clang just doing pointlessly stupid things.
>
> However, I found a case where yes, clang does pointlessly stupid
> things, but it's at least _partly_ our fault, and gcc can't generate
> optimal code either.
>
> We have this fairly critical code in __fget_files_rcu() to look up a
> 'struct file *' from an fd, and it does this:
>
> /* Mask is a 0 for invalid fd's, ~0 for valid ones */
> nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
>
> and clang makes a *horrid* mess of it, generating this code:
>
> movl %edi, %r14d
> movq 32(%rbx), %rdx
> movl (%rdx), %eax
> movq %rax, 8(%rsp)
> cmpq 8(%rsp), %r14
> sbbq %rcx, %rcx
>
> which is just crazy. Notice how it does that "move rax to stack, then
> do the compare against the stack", instead of just using %rax.
>
> In fact, that function shouldn't have a stack frame at all, and the
> only reason it is generated is because of this whole oddity.
>
> All clang's fault, right?
>
> Yeah, mostly. But it turns out that what really messes with clangs
> little head is that the x86 array_index_mask_nospec() function is
> being a bit annoying.
>
> This is what we do:
>
> static __always_inline unsigned long
> array_index_mask_nospec(unsigned long index,
> unsigned long size)
> {
> unsigned long mask;
>
> asm volatile ("cmp %1,%2; sbb %0,%0;"
> :"=r" (mask)
> :"g"(size),"r" (index)
> :"cc");
> return mask;
> }
>
> and look at the use again:
>
> nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
>
> here all the values are actually 'unsigned int'. So what happens is
> that clang can't just use the fdt->max_fds value *directly* from
> memory, because it needs to be expanded from 32-bit to 64-bit because
> we've made our array_index_mask_nospec() function only work on 64-bit
> 'unsigned long' values.
>
> So it turns out that by massaging this a bit, and making it just be a
> macro - so that the asm can decide that "I can do this in 32-bit" - I
> can get clang to generate much better code.
>
> Clang still absolutely hates the "g" constraint, so to get clang to
> really get this right I have to use "ir" instead of "g". Which is
> wrong. Because gcc does this right, and could use the memory op
> directly. But even gcc cannot do that with our *current* function,
> because of that "the memory value is 32-bit, we require a 64-bit
> value"
>
> Anyway, I can get gcc to generate the right code:
>
> movq 32(%r13), %rdx
> cmp (%rdx),%ebx
> sbb %esi,%esi
>
> which is basically the right code for the six crazy instructions clang
> generates. And if I make the "g" be "ir", I can get clang to generate
>
> movq 32(%rdi), %rcx
> movl (%rcx), %eax
> cmpl %eax, %esi
> sbbl %esi, %esi
>
> which is the same thing, but with that (pointless) load to a register.
>
> And now clang doesn't generate that stack frame at all.
>
> Anyway, this was a long email to explain the odd attached patch.
>
> Comments? Note that this patch is *entirely* untested, I have done
> this purely by looking at the code generation in fs/file.c.
>
> Linus

> arch/x86/include/asm/barrier.h | 23 +++++++++--------------
> 1 file changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> index 66e57c010392..6159d2cbbfde 100644
> --- a/arch/x86/include/asm/barrier.h
> +++ b/arch/x86/include/asm/barrier.h
> @@ -33,20 +33,15 @@
> * Returns:
> * 0 - (index < size)
> */
> -static __always_inline unsigned long array_index_mask_nospec(unsigned long index,
> - unsigned long size)
> -{
> - unsigned long mask;
> -
> - asm volatile ("cmp %1,%2; sbb %0,%0;"
> - :"=r" (mask)
> - :"g"(size),"r" (index)
> - :"cc");
> - return mask;
> -}
> -
> -/* Override the default implementation from linux/nospec.h. */
> -#define array_index_mask_nospec array_index_mask_nospec
> +#define array_index_mask_nospec(idx,sz) ({ \
> + typeof((idx)+(sz)) __idx = (idx); \
> + typeof(__idx) __sz = (sz); \
> + typeof(__idx) __mask; \
> + asm volatile ("cmp %1,%2; sbb %0,%0" \
> + :"=r" (__mask) \
> + :"ir"(__sz),"r" (__idx) \
> + :"cc"); \
> + __mask; })
>
> /* Prevent speculative execution past this barrier. */
> #define barrier_nospec() asm volatile("lfence":::"memory")