Re: [PATCH v2 1/2] x86/asm/bitops: ffs: use __builtin_ffs to evaluate constant expressions

From: Nick Desaulniers
Date: Wed May 11 2022 - 17:36:17 EST


On Wed, May 11, 2022 at 9:03 AM Vincent Mailhol
<mailhol.vincent@xxxxxxxxxx> wrote:
>
> For x86_64, the current ffs() implementation does not produce
> optimized code when called with a constant expression. On the
> contrary, the __builtin_ffs() function of both GCC and clang is able
> to simplify the expression into a single instruction.
>
> * Example *
>
> Let's consider two dummy functions foo() and bar() as below:
>
> | #include <linux/bitops.h>
> | #define CONST 0x01000000
> |
> | unsigned int foo(void)
> | {
> | return ffs(CONST);
> | }
> |
> | unsigned int bar(void)
> | {
> | return __builtin_ffs(CONST);
> | }
>
> GCC would produce below assembly code:

Thanks for the patch! LGTM.
Reviewed-by: Nick Desaulniers <ndesaulniers@xxxxxxxxxx>

>
> | 0000000000000000 <foo>:
> | 0: ba 00 00 00 01 mov $0x1000000,%edx
> | 5: b8 ff ff ff ff mov $0xffffffff,%eax
> | a: 0f bc c2 bsf %edx,%eax
> | d: 83 c0 01 add $0x1,%eax
> | 10: c3 ret

This should be the end of foo. I...actually don't know what's at the
end here. But I don't think the region from here...

> | 11: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
> | 18: 00 00 00 00
> | 1c: 0f 1f 40 00 nopl 0x0(%rax)

...to here is relevant.


> |
> | 0000000000000020 <bar>:
> | 20: b8 19 00 00 00 mov $0x19,%eax
> | 25: c3 ret
>
> And clang would produce:
>
> | 0000000000000000 <foo>:
> | 0: b8 ff ff ff ff mov $0xffffffff,%eax
> | 5: 0f bc 05 00 00 00 00 bsf 0x0(%rip),%eax # c <foo+0xc>
> | c: 83 c0 01 add $0x1,%eax
> | f: c3 ret

Weird, so I just tried this:
```
$ cat /tmp/x.c
#define CONST 0x01000000

unsigned ffs (int x) {
int r;
asm("bsfl %1,%0"
: "=r" (r)
: "rm" (x), "0" (-1));
return r;
}

unsigned int foo(void) {
return ffs(CONST);
}

unsigned int bar(void) {
return __builtin_ffs(CONST);
}
$ clang /tmp/x.c -O2 -o /tmp/x.o -c && llvm-objdump -dr /tmp/x.o
--disassemble-symbols=foo
...
0000000000000010 <foo>:
10: b8 19 00 00 00 movl $25, %eax
15: c3 retq
16: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
```
but if we make `ffs` `static`, we get:
```
0000000000000000 <foo>:
0: b8 ff ff ff ff movl $4294967295, %eax
# imm = 0xFFFFFFFF
5: 0f bc 05 00 00 00 00 bsfl (%rip), %eax
# 0xc <foo+0xc>
0000000000000008: R_X86_64_PC32 .LCPI0_0-0x4
c: c3 retq
d: 0f 1f 00 nopl (%rax)
```
Which is very interesting to me; it looks like constant propagation
actually hurt optimization, we lost that this was a libcall which we
could have optimized.

As in LLVM does:
1. sink CONST into ffs; it's static and has one caller
2. delete x parameter; it's unused
3. now libcall optimization just sees a call to ffs with no params,
that doesn't match the signature of libc.

Your change should fix that since we don't even call a function named
ffs if we have a constant (explicitly, or via optimization). Filed
https://github.com/llvm/llvm-project/issues/55394
--
Thanks,
~Nick Desaulniers