Re: [RFC PATCH 1/2] bits: introduce ffs_val()

Next message: Borislav Petkov: "Re: [PATCH v17 30/32] fs/resctrl: Provide interface to create architecture specific debugfs area"
Previous message: Dmitry Baryshkov: "Re: [PATCH v3 0/3] Retrieve information about DDR from SMEM"
In reply to: Arnd Bergmann: "Re: [RFC PATCH 1/2] bits: introduce ffs_val()"
Next in thread: Thomas Zimmermann: "Re: [RFC PATCH 1/2] bits: introduce ffs_val()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: David Laight

Date: Sat Jan 10 2026 - 05:50:09 EST

On Fri, 9 Jan 2026 17:37:56 +0100
Petr Tesarik <ptesarik@xxxxxxxx> wrote:

> Introduce a macro that can efficiently extract the least significant
> non-zero bit from a value.
>
> Interestingly, this bit-twiddling trick is open-coded in some places, but
> it also appears to be little known, leading to various inefficient
> implementations in other places. Let's make it part of the standard bitops
> arsenal.

I'm not sure whether ffs_val(x) is actually more readable than an
open-coded (x & -x).
If you don't know what either means you have to look it up or work
it out.
The latter just requires a bit of thought, the former searching through
the source tree for the correct header and then believing the comment
or, again, working out what it does.

That said, I'm not objecting to adding it, but the churn of changing
existing code is probably not worth the effort.

I'd also define it as x & (~x + 1) - which makes it a lot more obvious
why it is correct, the compiler will convert it to a signed negate.

Also, as I pointed out earlier, many modern cpu have an instruction
for ffs(). While x & -x is usualy better than 1u << __ffs(x); the same
is not true for y * (x & -x) and y << __ffs(x).
In particular on Zen4/5 bsf (used for __ffs) has a latency of 1 but the
multiply has a latency of 3.
Intel mainstream x86 cpu all have latency 3 for both imul and bsf.

There should be #define definitions of is_power_of_2_or_zero() !(x + (x-1))
and is_power_of_2() (!x && is_power_of_2_or_zero(x)) in the same header.
But there is only an inline is_power_of_2(unsigned long) in log.h.

David