Re: More annoying code generation by clang

From: Uros Bizjak
Date: Sat Apr 06 2024 - 08:30:52 EST


On Sat, Apr 6, 2024 at 12:56 PM Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
>
> [ I've Cc:-ed a few more people who might be interested in this. ]
> [ Copy of Linus's email below. ]
>
> * Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > So this doesn't really matter in any real life situation, but it
> > really grated on me.
> >
> > Clang has this nasty habit of taking our nice asm constraints, and
> > turning them into worst-case garbage. It's been reported a couple of
> > times where we use "g" to tell the compiler that pretty much any
> > source to the asm works, and then clang takes that to mean "I will
> > take that to use 'memory'" even when that makes no sense what-so-ever.
> >
> > See for example
> >
> > https://lore.kernel.org/all/CAHk-=wgobnShg4c2yyMbk2p=U-wmnOmX_0=b3ZY_479Jjey2xw@xxxxxxxxxxxxxx/
> >
> > where I was ranting about clang just doing pointlessly stupid things.
> >
> > However, I found a case where yes, clang does pointlessly stupid
> > things, but it's at least _partly_ our fault, and gcc can't generate
> > optimal code either.
> >
> > We have this fairly critical code in __fget_files_rcu() to look up a
> > 'struct file *' from an fd, and it does this:
> >
> > /* Mask is a 0 for invalid fd's, ~0 for valid ones */
> > nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
> >
> > and clang makes a *horrid* mess of it, generating this code:
> >
> > movl %edi, %r14d
> > movq 32(%rbx), %rdx
> > movl (%rdx), %eax
> > movq %rax, 8(%rsp)
> > cmpq 8(%rsp), %r14
> > sbbq %rcx, %rcx
> >
> > which is just crazy. Notice how it does that "move rax to stack, then
> > do the compare against the stack", instead of just using %rax.
> >
> > In fact, that function shouldn't have a stack frame at all, and the
> > only reason it is generated is because of this whole oddity.
> >
> > All clang's fault, right?
> >
> > Yeah, mostly. But it turns out that what really messes with clangs
> > little head is that the x86 array_index_mask_nospec() function is
> > being a bit annoying.
> >
> > This is what we do:
> >
> > static __always_inline unsigned long
> > array_index_mask_nospec(unsigned long index,
> > unsigned long size)
> > {
> > unsigned long mask;
> >
> > asm volatile ("cmp %1,%2; sbb %0,%0;"
> > :"=r" (mask)
> > :"g"(size),"r" (index)
> > :"cc");
> > return mask;
> > }
> >
> > and look at the use again:
> >
> > nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
> >
> > here all the values are actually 'unsigned int'. So what happens is
> > that clang can't just use the fdt->max_fds value *directly* from
> > memory, because it needs to be expanded from 32-bit to 64-bit because
> > we've made our array_index_mask_nospec() function only work on 64-bit
> > 'unsigned long' values.
> >
> > So it turns out that by massaging this a bit, and making it just be a
> > macro - so that the asm can decide that "I can do this in 32-bit" - I
> > can get clang to generate much better code.
> >
> > Clang still absolutely hates the "g" constraint, so to get clang to
> > really get this right I have to use "ir" instead of "g". Which is
> > wrong. Because gcc does this right, and could use the memory op
> > directly. But even gcc cannot do that with our *current* function,
> > because of that "the memory value is 32-bit, we require a 64-bit
> > value"
> >
> > Anyway, I can get gcc to generate the right code:
> >
> > movq 32(%r13), %rdx
> > cmp (%rdx),%ebx
> > sbb %esi,%esi
> >
> > which is basically the right code for the six crazy instructions clang
> > generates. And if I make the "g" be "ir", I can get clang to generate
> >
> > movq 32(%rdi), %rcx
> > movl (%rcx), %eax
> > cmpl %eax, %esi
> > sbbl %esi, %esi
> >
> > which is the same thing, but with that (pointless) load to a register.
> >
> > And now clang doesn't generate that stack frame at all.
> >
> > Anyway, this was a long email to explain the odd attached patch.
> >
> > Comments? Note that this patch is *entirely* untested, I have done
> > this purely by looking at the code generation in fs/file.c.

FYI, please note that gcc-12 is able to synthesize carry-flag compares
on its own:

--cut here--
unsigned long foo (unsigned long index, unsigned long size)
{
return (index <= size) ? 0 : -1;
}

extern unsigned int size;

unsigned long bar (unsigned int index)
{
return (index <= size) ? 0 : -1;
}
--cut here--

gcc-12 -O2:

foo:
cmpq %rdi, %rsi
sbbq %rax, %rax
ret
bar:
cmpl %edi, size(%rip)
sbbq %rax, %rax
ret

Uros.

> >
> > Linus
>
> > arch/x86/include/asm/barrier.h | 23 +++++++++--------------
> > 1 file changed, 9 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> > index 66e57c010392..6159d2cbbfde 100644
> > --- a/arch/x86/include/asm/barrier.h
> > +++ b/arch/x86/include/asm/barrier.h
> > @@ -33,20 +33,15 @@
> > * Returns:
> > * 0 - (index < size)
> > */
> > -static __always_inline unsigned long array_index_mask_nospec(unsigned long index,
> > - unsigned long size)
> > -{
> > - unsigned long mask;
> > -
> > - asm volatile ("cmp %1,%2; sbb %0,%0;"
> > - :"=r" (mask)
> > - :"g"(size),"r" (index)
> > - :"cc");
> > - return mask;
> > -}
> > -
> > -/* Override the default implementation from linux/nospec.h. */
> > -#define array_index_mask_nospec array_index_mask_nospec
> > +#define array_index_mask_nospec(idx,sz) ({ \
> > + typeof((idx)+(sz)) __idx = (idx); \
> > + typeof(__idx) __sz = (sz); \
> > + typeof(__idx) __mask; \
> > + asm volatile ("cmp %1,%2; sbb %0,%0" \
> > + :"=r" (__mask) \
> > + :"ir"(__sz),"r" (__idx) \
> > + :"cc"); \
> > + __mask; })
> >
> > /* Prevent speculative execution past this barrier. */
> > #define barrier_nospec() asm volatile("lfence":::"memory")
>