For x86_64, the current ffs() implementation does not produce
optimized code when called with a constant expression. On the
contrary, the __builtin_ffs() function of both GCC and clang is able
to simplify the expression into a single instruction.
nitpicking: numbers look odd.
** Statistics **
On a allyesconfig, before applying this patch...:
| $ objdump -d vmlinux.o | grep bsf | wc -l
| 3607
...and after:
| $ objdump -d vmlinux.o | grep bsf | wc -l
| 792
So, roughly 26.7% of the call to ffs() were using constant expression
and were optimized out.