Dear Xuerui,
Thank you for your patches.
Am 03.08.23 um 19:08 schrieb WANG Xuerui:
From: WANG Xuerui <git@xxxxxxxxxx>
Similar to the syndrome calculation, the recovery algorithms also work
on 64 bytes at a time to align with the L1 cache line size of current
and future LoongArch cores (that we care about). Which means
unrolled-by-4 LSX and unrolled-by-2 LASX code.
The assembly is originally based on the x86 SSSE3/AVX2 ports, but
register allocation has been redone to take advantage of LSX/LASX's 32
vector registers, and instruction sequence has been optimized to suit
(e.g. LoongArch can perform per-byte srl and andi on vectors, but x86
cannot).
Performance numbers measured by instrumenting the raid6test code:
It’d be great, if you also documented your test setup. That’s always good for benchmarking numbers.
lasx 2data: 354.987 MiB/s
lasx datap: 350.430 MiB/s
lsx 2data: 340.026 MiB/s
lsx datap: 337.318 MiB/s
intx1 2data: 164.280 MiB/s
intx1 datap: 187.966 MiB/s
So the speed is more than doubled. Nice job! The lasx implementation is always the fastest. Is it therefore the preferred one? Or does it come with higher power consumption?
Signed-off-by: WANG Xuerui <git@xxxxxxxxxx>
Out of curiosity, what is your “first” name?
---
include/linux/raid/pq.h | 2 +
lib/raid6/Makefile | 2 +-
lib/raid6/algos.c | 8 +
lib/raid6/recov_loongarch_simd.c | 515 +++++++++++++++++++++++++++++++
lib/raid6/test/Makefile | 2 +-
5 files changed, 527 insertions(+), 2 deletions(-)
create mode 100644 lib/raid6/recov_loongarch_simd.c
Kind regards,
Paul
[snip]
+
+ /* Now, pick the proper data tables */
+ pbmul = raid6_vgfmul[raid6_gfexi[failb-faila]];
Should spaces be put around the operator?
+ qmul = raid6_vgfmul[raid6_gfinv[raid6_gfexp[faila] ^
+ raid6_gfexp[failb]]];
+
[snip]
+
+ /* Now, pick the proper data tables */
+ qmul = raid6_vgfmul[raid6_gfinv[raid6_gfexp[faila]]];
Only one space after qmul?
[snip]
+ /* Now, pick the proper data tables */
+ pbmul = raid6_vgfmul[raid6_gfexi[failb-faila]];
Ditto.
[snip]
+ /* Now, pick the proper data tables */
+ qmul = raid6_vgfmul[raid6_gfinv[raid6_gfexp[faila]]];
Ditto.
[snip]
diff --git a/lib/raid6/test/Makefile b/lib/raid6/test/Makefile
index 7b244bce32b3d..2abe0076a636c 100644
--- a/lib/raid6/test/Makefile
+++ b/lib/raid6/test/Makefile
@@ -65,7 +65,7 @@ else ifeq ($(HAS_ALTIVEC),yes)
OBJS += altivec1.o altivec2.o altivec4.o altivec8.o \
vpermxor1.o vpermxor2.o vpermxor4.o vpermxor8.o
else ifeq ($(ARCH),loongarch64)
- OBJS += loongarch_simd.o
+ OBJS += loongarch_simd.o recov_loongarch_simd.o
endif
.c.o:
Kind regards,
Paul
PS: I brought up the raid speed tests in the past, and Borislav called them a random number generator [1]. ;-)
[1]: https://lore.kernel.org/all/20210406124126.GM17806@xxxxxxx/