Re: [RFC PATCH] riscv: Optimize gcd() performance by selecting CPU_NO_EFFICIENT_FFS

From: Alexandre Ghiti
Date: Fri Mar 28 2025 - 10:10:07 EST


Hi Kuan-Wei,

First sorry for the late review.

On 17/02/2025 02:37, Kuan-Wei Chiu wrote:
When the Zbb extension is not supported, ffs() falls back to a software
implementation instead of leveraging the hardware ctz instruction for
fast computation. In such cases, selecting CPU_NO_EFFICIENT_FFS
optimizes the efficiency of gcd().

The implementation of gcd() depends on the CPU_NO_EFFICIENT_FFS option.
With hardware support for ffs, the binary GCD algorithm is used.
Without it, the odd-even GCD algorithm is employed for better
performance.

Co-developed-by: Yu-Chun Lin <eleanor15x@xxxxxxxxx>
Signed-off-by: Yu-Chun Lin <eleanor15x@xxxxxxxxx>
Signed-off-by: Kuan-Wei Chiu <visitorckw@xxxxxxxxx>
---
Although selecting NO_EFFICIENT_FFS seems reasonable without ctz
instructions, this patch hasn't been tested on real hardware. We'd
greatly appreciate it if someone could help test and provide
performance numbers!

arch/riscv/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 7612c52e9b1e..2dd3699ad09b 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -91,6 +91,7 @@ config RISCV
select CLINT_TIMER if RISCV_M_MODE
select CLONE_BACKWARDS
select COMMON_CLK
+ select CPU_NO_EFFICIENT_FFS if !RISCV_ISA_ZBB
select CPU_PM if CPU_IDLE || HIBERNATION || SUSPEND
select EDAC_SUPPORT
select FRAME_POINTER if PERF_EVENTS || (FUNCTION_TRACER && !DYNAMIC_FTRACE)


So your patch is correct. But a kernel built with RISCV_ISA_ZBB does not mean the platform supports zbb and in that case, we'd still use the slow version of gcd().

Then I would use static keys instead, can you try to come up with a patch that does that?

Thanks,

Alex