[PATCH v4 2/2] bitops: Optimize fns() for improved performance

From: Kuan-Wei Chiu
Date: Wed May 01 2024 - 09:21:23 EST


The current fns() repeatedly uses __ffs() to find the index of the
least significant bit and then clears the corresponding bit using
__clear_bit(). The method for clearing the least significant bit can be
optimized by using word &= word - 1 instead.

Typically, the execution time of one __ffs() plus one __clear_bit() is
longer than that of a bitwise AND operation and a subtraction. To
improve performance, the loop for clearing the least significant bit
has been replaced with word &= word - 1, followed by a single __ffs()
operation to obtain the answer. This change reduces the number of
__ffs() iterations from n to just one, enhancing overall performance.

This modification significantly accelerates the fns() function in the
test_bitops benchmark, improving its speed by approximately 439 times.
Additionally, it enhances the performance of find_nth_bit() in the
find_bit benchmark by approximately 26%.

Before:
test_bitops: fns: 5876762553 ns, 64000000 iterations
find_nth_bit: 4254313 ns, 16525 iterations

After:
test_bitops: fns: 13388431 ns, 64000000 iterations
find_nth_bit: 3362863 ns, 16501 iterations

Signed-off-by: Kuan-Wei Chiu <visitorckw@xxxxxxxxx>
---
include/linux/bitops.h | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 2ba557e067fe..57ecef354f47 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -254,16 +254,10 @@ static inline unsigned long __ffs64(u64 word)
*/
static inline unsigned long fns(unsigned long word, unsigned int n)
{
- unsigned int bit;
+ while (word && n--)
+ word &= word - 1;

- while (word) {
- bit = __ffs(word);
- if (n-- == 0)
- return bit;
- __clear_bit(bit, &word);
- }
-
- return BITS_PER_LONG;
+ return word ? __ffs(word) : BITS_PER_LONG;
}

/**
--
2.34.1