Re: [RFC PATCH 1/2] bits: introduce ffs_val()

Next message: Sebastian Reichel: "Re: [PATCH] gpio: rockchip: mark the GPIO controller as sleeping"
Previous message: Shuah: "Re: [PATCH v5 0/4] selftests/resctrl: Add Hygon CPUs support and bug fixes"
In reply to: Arnd Bergmann: "Re: [RFC PATCH 1/2] bits: introduce ffs_val()"
Next in thread: Yury Norov: "Re: [RFC PATCH 1/2] bits: introduce ffs_val()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: David Laight

Date: Fri Jan 09 2026 - 18:54:34 EST

On Fri, 09 Jan 2026 18:01:02 +0100
"Arnd Bergmann" <arnd@xxxxxxxx> wrote:

> On Fri, Jan 9, 2026, at 17:37, Petr Tesarik wrote:
>
> > + * Returns:
> > + * least significant non-zero bit, 0 if all bits are zero
> > + */
> > +#define ffs_val(x) \
> > +({ \
> > + const typeof(x) val__ = (x); \
> > + val__ & -val__; \
> > +})
>
> This looks good to me, but I'd suggest using 'const auto val__'
> instead of typeof(), to reduce expanding complex arguments twice.

It is more usual to just use a single _ prefix and the same name.
I wouldn't bother with 'const' either, maybe:
#define ffs_val(val) ({ \
auto _val = val; \
_val & -_val; \
})

However it isn't necessarily better than using __ffs().
FIELD_PREP(mask, val) (for non-constant mask) can be (mask & -mask) * val
or val << __ffs(mask).
So the 'ffs' version is fewer instructions (assuming non-zero mask).
The timings for bsf/bsr are similar to those for imul on intel cpu,
mul wins on zen3 and bsf/bsr on zen4.
On balance the __ffs() version is actually likely to be faster.

Other architectures may fair better or worse.
Clearly you don't want to use __ffs() unless it is a single instruction.

Of course, for FIELD_GET(reg, mask) you'd need reg/(mask & -mask)
and the cost of the integer division is far more than __ffs().
(But that will give you a compile-time error if mask is a constant zero
and the compiler will convert the divide to a shift.)

And for 64bit calculations on 32bit 'all bets are off'.
Even the shift left might be problematic.

David

>
> Arnd
>