Re: [PATCH v2] raid6: arm64: add SVE optimized implementation for syndrome generation

Next message: Wxm-233: "[BUG]: ocfs2 possible lock inversion involving ip_alloc_sem and local alloc inode"
Previous message: Jakub Kicinski: "Re: [PATCH net] netrom: do some basic forms of validation on incoming frames"
In reply to: Mark Brown: "Re: [PATCH v2] raid6: arm64: add SVE optimized implementation for syndrome generation"
Next in thread: Ard Biesheuvel: "Re: [PATCH v2] raid6: arm64: add SVE optimized implementation for syndrome generation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Robin Murphy

Date: Thu Apr 16 2026 - 13:03:25 EST

On 16/04/2026 5:47 pm, Mark Brown wrote:

On Thu, Apr 16, 2026 at 05:26:08PM +0100, Robin Murphy wrote:

Unless you've got a CPU with truly big wide vector units that _can't_ be
fully utilised by ASMID ops, then SVE is only really offering whatever
incidental benefits fall out of smaller code size. However, if you do have
those wider vectors, then the cost of correctly saving/restoring the SVE
state - of which a userspace benchmark isn't likely to be very
representative - is also going to scale up significantly.

The other case will be when there's some SVE only extension that
accelerates something that's relevant for the algorithm. That's not
really a thing at present but I imagine that we'll run into that at some
point.

Indeed - I was implicitly thinking in terms of things that _are_ just transliterated from NEON to SVE, where the primary gain is stuff like predicate loops, but even that _could_ potentially be enough to justify an argument in-kernel SVE (using a 128-bit VL to keep the additional state/cost to a minimum).

Cheers,
Robin.