Re: [PATCH v2] raid6: arm64: add SVE optimized implementation for syndrome generation

From: Demian Shulhan

Date: Tue Mar 31 2026 - 09:20:14 EST

Hi all,

Ard, your questions regarding real-world I/O bottlenecks and SVE power
efficiency versus raw throughput are entirely valid. I agree that
introducing SVE support requires solid real-world data to justify the
added complexity.

Due to my current workload, I won't be able to run the necessary
hardware tests and prepare the benchmark code immediately. I will get
back to the list in about 1 week with the requested source code,
unmangled test results, and further analysis.

Thanks!

вт, 31 бер. 2026 р. о 09:37 Christoph Hellwig <hch@xxxxxx> пише:
>
> On Mon, Mar 30, 2026 at 06:39:49PM +0200, Ard Biesheuvel wrote:
> > I think the results are impressive, but I'd like to better understand
> > its implications on a real-world scenario. Is this code only a
> > bottleneck when rebuilding an array?
>
> The syndrome generation is run every time you write data to a RAID6
> array, and if you do partial stripe writes it (or rather the XOR
> variant) is run twice. So this is the most performance critical
> path for writing to RAID6.
>
> Rebuild usually runs totally different code, but can end up here as well
> when both parity disks are lost.
>
> > > Furthermore, as Christoph suggested, I tested scalability on wider
> > > arrays since the default kernel benchmark is hardcoded to 8 disks,
> > > which doesn't give the unrolled SVE loop enough data to shine. On a
> > > 16-disk array, svex4 hits 15.1 GB/s compared to 8.0 GB/s for neonx4.
> > > On a 24-disk array, while neonx4 chokes and drops to 7.8 GB/s, svex4
> > > maintains a stable 15.0 GB/s — effectively doubling the throughput.
> >
> > Does this mean the kernel benchmark is no longer fit for purpose? If
> > it cannot distinguish between implementations that differ in performance
> > by a factor of 2, I don't think we can rely on it to pick the optimal one.
>
> It is not good, and we should either fix it or run more than one.
> The current setup is not really representative of real-life array.
> It also leads to wrong selections on x86, but only at the which unroll
> level to pick level, and only for minor differences so far. I plan
> to add this to the next version of the raid6 lib patches.
>