Re: [PATCH v3] lib/raid/xor: x86: Add AVX-512 optimized xor_gen()

Next message: Matti Vaittinen: "Re: [PATCH 6/7] hwmon: adm1275: Support ROHM BD12790"
Previous message: Matti Vaittinen: "Re: [PATCH 6/7] hwmon: adm1275: Support ROHM BD12790"
In reply to: Christoph Hellwig: "Re: [PATCH v3] lib/raid/xor: x86: Add AVX-512 optimized xor_gen()"
Next in thread: David Laight: "Re: [PATCH v3] lib/raid/xor: x86: Add AVX-512 optimized xor_gen()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Christoph Hellwig

Date: Wed Jun 17 2026 - 01:57:18 EST

Can use the xor: prefix used for all other commits to lib/raid/xor?

> Benchmark on AMD Ryzen 9 9950X (Zen 5):
>
> src_cnt avx avx512 Improvement
> ======= ========== ========== ===========
> 1 56353 MB/s 75388 MB/s 33%
> 2 54274 MB/s 68409 MB/s 26%
> 3 44649 MB/s 64042 MB/s 43%
> 4 41315 MB/s 55002 MB/s 33%

On my Zen 5 mobile (AMD Ryzen AI 7 PRO 350) both the existing
AVX2 and this AVX512 code give numbers in the 200+ GB/s range. Not
sure if is just the different benchmarking or something else going on.

FYI, one or 2 sources are basically useless as they RAID5 configs
that have no benefits over simple mirroring and thus the numbers
aren't too interesting.

> +DO_XOR_BLOCKS(avx512_inner, xor_avx512_2, xor_avx512_3, xor_avx512_4,
> + xor_avx512_5);

Is there really much of a benefit of doing the historic DO_XOR_BLOCKS
vs doing the loop manually? Especially as the common cases for a
modern RAID will usually loop over more disks than this was built
for. I.e., in practice one or two source buffers only happen at the
end of a loop over more disks.