Re: [PATCH v2 2/2] bitops: Optimize fns() for improved performance

From: Yury Norov
Date: Tue Apr 30 2024 - 13:36:48 EST


On Tue, Apr 30, 2024 at 01:49:12PM +0800, Kuan-Wei Chiu wrote:
> The current fns() repeatedly uses __ffs() to find the index of the
> least significant bit and then clears the corresponding bit using
> __clear_bit(). The method for clearing the least significant bit can be
> optimized by using word &= word - 1 instead.
>
> Typically, the execution time of one __ffs() plus one __clear_bit() is
> longer than that of a bitwise AND operation and a subtraction. To
> improve performance, the loop for clearing the least significant bit
> has been replaced with word &= word - 1, followed by a single __ffs()
> operation to obtain the answer. This change reduces the number of
> __ffs() iterations from n to just one, enhancing overall performance.
>
> This change speeds up the find_nth_bit() in the find_bit benchmark by
> approximately ~1.26 times:

26% sounds better.

> Before:
> find_nth_bit: 4254313 ns, 16525 iterations
>
> After:
> find_nth_bit: 3362863 ns, 16501 iterations
>
> The following microbenchmark data, conducted on my x86-64 machine,
> shows the execution time (in microseconds) required for 1000000 test
> data generated by get_random_u64() and executed by fns() under
> different values of n:
>
> +-----+---------------+---------------+
> | n | time_old | time_new |
> +-----+---------------+---------------+
> | 0 | 29194 | 25878 |
> | 1 | 25510 | 25497 |
> | 2 | 27836 | 25721 |
> | 3 | 30140 | 25673 |
> | 4 | 32569 | 25426 |
> | 5 | 34792 | 25690 |
> | 6 | 37117 | 25651 |
> | 7 | 39742 | 25383 |
> | 8 | 42360 | 25657 |
> | 9 | 44672 | 25897 |
> | 10 | 47237 | 25819 |
> | 11 | 49884 | 26530 |
> | 12 | 51864 | 26647 |
> | 13 | 54265 | 28915 |
> | 14 | 56440 | 28373 |
> | 15 | 58839 | 28616 |
> | 16 | 62383 | 29128 |
> | 17 | 64257 | 30041 |
> | 18 | 66805 | 29773 |
> | 19 | 69368 | 33203 |
> | 20 | 72942 | 33688 |
> | 21 | 77006 | 34518 |
> | 22 | 80926 | 34298 |
> | 23 | 85723 | 35586 |
> | 24 | 90324 | 36376 |
> | 25 | 95992 | 37465 |
> | 26 | 101101 | 37599 |
> | 27 | 106520 | 37466 |
> | 28 | 113287 | 38163 |
> | 29 | 120552 | 38810 |
> | 30 | 128040 | 39373 |
> | 31 | 135624 | 40500 |
> | 32 | 142580 | 40343 |
> | 33 | 148915 | 40460 |
> | 34 | 154005 | 41294 |
> | 35 | 157996 | 41730 |
> | 36 | 160806 | 41523 |
> | 37 | 162975 | 42088 |
> | 38 | 163426 | 41530 |
> | 39 | 164872 | 41789 |
> | 40 | 164477 | 42505 |
> | 41 | 164758 | 41879 |
> | 42 | 164182 | 41415 |
> | 43 | 164842 | 42119 |
> | 44 | 164881 | 42297 |
> | 45 | 164870 | 42145 |
> | 46 | 164673 | 42066 |
> | 47 | 164616 | 42051 |
> | 48 | 165055 | 41902 |
> | 49 | 164847 | 41862 |
> | 50 | 165171 | 41960 |
> | 51 | 164851 | 42089 |
> | 52 | 164763 | 41717 |
> | 53 | 164635 | 42154 |
> | 54 | 164757 | 41983 |
> | 55 | 165095 | 41419 |
> | 56 | 164641 | 42381 |
> | 57 | 164601 | 41654 |
> | 58 | 164864 | 41834 |
> | 59 | 164594 | 41920 |
> | 60 | 165207 | 42020 |
> | 61 | 165056 | 41185 |
> | 62 | 165160 | 41722 |
> | 63 | 164923 | 41702 |
> | 64 | 164777 | 41880 |
> +-----+---------------+---------------+

Please, just a total gross in the commit message.

> Signed-off-by: Kuan-Wei Chiu <visitorckw@xxxxxxxxx>
> ---
>
> Changes in v2:
> - Change the loop in fns() by counting down from n to 0.
> - Add find_bit benchmark result for find_nth_bit in commit message.
>
> include/linux/bitops.h | 12 +++---------
> 1 file changed, 3 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 2ba557e067fe..57ecef354f47 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -254,16 +254,10 @@ static inline unsigned long __ffs64(u64 word)
> */
> static inline unsigned long fns(unsigned long word, unsigned int n)
> {
> - unsigned int bit;
> + while (word && n--)
> + word &= word - 1;
>
> - while (word) {
> - bit = __ffs(word);
> - if (n-- == 0)
> - return bit;
> - __clear_bit(bit, &word);
> - }
> -
> - return BITS_PER_LONG;
> + return word ? __ffs(word) : BITS_PER_LONG;
> }
>
> /**
> --
> 2.34.1