Re: [PATCH] riscv: lib: optimize strlen loop efficiency

Next message: Greg KH: "Re: [PATCH 0/4] staging: rtl8723bs: coding style and refactoring cleanups"
Previous message: kernel test robot: "Re: [PATCH v4 2/3] tracing/fprobe: Support comma-separated symbols and :entry/:exit"
In reply to: Feng Jiang: "Re: [PATCH] riscv: lib: optimize strlen loop efficiency"
Next in thread: David Laight: "Re: [PATCH] riscv: lib: optimize strlen loop efficiency"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: David Laight

Date: Thu Jan 15 2026 - 06:19:50 EST

On Wed, 14 Jan 2026 19:03:17 -0700 (MST)
Paul Walmsley <pjw@xxxxxxxxxx> wrote:

> On Thu, 18 Dec 2025, Feng Jiang wrote:
>
> > Optimize the generic strlen implementation by using a pre-decrement
> > pointer. This reduces the loop body from 4 instructions to 3 and
> > eliminates the unconditional jump ('j').
> >
> > Old loop (4 instructions, 2 branches):
> > 1: lbu t0, 0(t1); beqz t0, 2f; addi t1, t1, 1; j 1b
> >
> > New loop (3 instructions, 1 branch):
> > 1: addi t1, t1, 1; lbu t0, 0(t1); bnez t0, 1b

Is that a change to the generic C code?
Testing (++sc)[-1] might do the trick without requiring the extra read
of the first location.

> >
> > This change improves execution efficiency and reduces branch pressure
> > for systems without the Zbb extension.
>
> Looks reasonable; do you have any benchmarks on hardware that you can
> share? Any reason why this patch stands alone and isn't rolled up as part
> of your "optimize string function" series?

For 64bit you can do a lot better (in C) by loading 64bit words and doing
the correct 'shift and mask' sequence to detect a zero byte.
It usually isn't worth in for 32bit.

Does need to handle a mis-aligned base - eg by masking the bits off
the base pointer and or'ing in non-zero values to the value read from
the base pointer.

David