Re: [PATCH v2] raid6: arm64: add SVE optimized implementation for syndrome generation

From: Christoph Hellwig

Date: Tue Mar 31 2026 - 02:37:17 EST


On Mon, Mar 30, 2026 at 06:39:49PM +0200, Ard Biesheuvel wrote:
> I think the results are impressive, but I'd like to better understand
> its implications on a real-world scenario. Is this code only a
> bottleneck when rebuilding an array?

The syndrome generation is run every time you write data to a RAID6
array, and if you do partial stripe writes it (or rather the XOR
variant) is run twice. So this is the most performance critical
path for writing to RAID6.

Rebuild usually runs totally different code, but can end up here as well
when both parity disks are lost.

> > Furthermore, as Christoph suggested, I tested scalability on wider
> > arrays since the default kernel benchmark is hardcoded to 8 disks,
> > which doesn't give the unrolled SVE loop enough data to shine. On a
> > 16-disk array, svex4 hits 15.1 GB/s compared to 8.0 GB/s for neonx4.
> > On a 24-disk array, while neonx4 chokes and drops to 7.8 GB/s, svex4
> > maintains a stable 15.0 GB/s — effectively doubling the throughput.
>
> Does this mean the kernel benchmark is no longer fit for purpose? If
> it cannot distinguish between implementations that differ in performance
> by a factor of 2, I don't think we can rely on it to pick the optimal one.

It is not good, and we should either fix it or run more than one.
The current setup is not really representative of real-life array.
It also leads to wrong selections on x86, but only at the which unroll
level to pick level, and only for minor differences so far. I plan
to add this to the next version of the raid6 lib patches.