RE: [PATCH] riscv: fix memmove and optimise memcpy when misalign

From: David Laight
Date: Tue Jun 15 2021 - 10:08:33 EST


From: Bin Meng
> Sent: 15 June 2021 14:40
...
> > I prefer C versions as well, and actually before commit 04091d6 we are
> > indeed using the generic C version. The issue is that 04091d6
> > introduces an assembly version that's very broken. It does not offer
> > and performance improvement to the C version, and breaks all processors
> > without hardware misalignment support

There may need to be a few C implementations for different cpu
instruction sets.
While the compiler might manage to DTRT (or the wrong thing given
the right source) using a loop that matches the instruction set
is a good idea.

For instance, x86 can do *(reg_1 + reg_2 * (1|2|4|8) + constant)
so you can increment reg_2 and use it for both buffers while
still unrolling enough to hit memory bandwidth.

With only *(reg_1 + constant) you need to increment both the
source and destination addresses.

OTOH you can save an instruction on x86 by adding to 'reg_2'
until it becomes zero (so you don't need add, cmp and jmp).

But a mips-like instruction set (includes riscv and nios2)
has 'compare and branch' so you only ever need one instruction
at the end of the loop.

Having to handle misaligned copies is another distinct issue.
For some 32bit cpu byte copies may be as fast as any shift
and mask code.

> > (yes, firmware is expected to
> > trap and handle these, but they are painfully slow).

Yes, to the point where the system should just panic and
force you to fix the code.

When I were a lad we forced everyone to fix there code
so it would run on sparc.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)