Re: [PATCH v3 2/2] x86/asm/bitops: __ffs,ffz: use __builtin_ctzl to evaluate constant expressions

From: Nick Desaulniers
Date: Wed May 11 2022 - 20:20:13 EST


On Wed, May 11, 2022 at 5:04 PM Vincent Mailhol
<mailhol.vincent@xxxxxxxxxx> wrote:
>
> __ffs(x) is equivalent to (unsigned long)__builtin_ctzl(x) and ffz(x)
> is equivalent to (unsigned long)__builtin_ctzl(~x). Because
> __builting_ctzl() returns an int, a cast to (unsigned long) is
> necessary to avoid potential warnings on implicit casts.
>
> For x86_64, the current __ffs() and ffz() implementations do not
> produce optimized code when called with a constant expression. On the
> contrary, the __builtin_ctzl() gets simplified into a single
> instruction.
>
> However, for non constant expressions, the __ffs() and ffz() asm
> versions of the kernel remains slightly better than the code produced
> by GCC (it produces a useless instruction to clear eax).
>
> This patch uses the __builtin_constant_p() to select between the
> kernel's __ffs()/ffz() and the __builtin_ctzl() depending on whether
> the argument is constant or not.
>
> ** Statistics **
>
> On a allyesconfig, before applying this patch...:
>
> | $ objdump -d vmlinux.o | grep tzcnt | wc -l
> | 3607
>
> ...and after:
>
> | $ objdump -d vmlinux.o | grep tzcnt | wc -l
> | 2600
>
> So, roughly 27.9% of the calls to either __ffs() or ffz() were using
> constant expressions and could be optimized out.
>
> (tests done on linux v5.18-rc5 x86_64 using GCC 11.2.1)
>
> Note: on x86_64, the asm bsf instruction produces tzcnt when used with
> the ret prefix (which is why we grep tzcnt instead of bsf in above
> benchmark). c.f. [1]
>
> [1] commit e26a44a2d618 ("x86: Use REP BSF unconditionally")
> http://lkml.kernel.org/r/5058741E020000780009C014@xxxxxxxxxxxxxxxxxxxx
>
> CC: Nick Desaulniers <ndesaulniers@xxxxxxxxxx>
> Signed-off-by: Vincent Mailhol <mailhol.vincent@xxxxxxxxxx>

Thanks for the patches!
Reviewed-by: Nick Desaulniers <ndesaulniers@xxxxxxxxxx>


--
Thanks,
~Nick Desaulniers