Re: [PATCH v2] lib/raid/xor: x86: Add AVX-512 optimized xor_gen()

Next message: Weiming Shi: "Re: [PATCH v2 0/2] tty: n_gsm: fix gsm_queue() UAF and add a base regression test"
Previous message: Wang, Jie: "RE: [PATCH] Input: yealink - stop URB resubmission on completion error"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Christoph Hellwig

Date: Wed Jun 17 2026 - 01:52:30 EST

On Mon, Jun 15, 2026 at 11:44:35AM -0700, Eric Biggers wrote:
> > Doesn't zen4 only have a 256bit bus between the cpu and cache?
> > So avx512 reads take two clocks.
> > Since this is memory limited it is unlikely to run faster than the
> > avx256 version.
>
> On AMD Genoa (Zen 4 server processor), the AVX-512 code added by this
> patch is indeed about the same speed as the existing AVX-2 code.

The same is true for Zen 5 mobile which has the same AVX-512 limitations.
I don't think it's the bus width, but I'll leave the details to the
experts.

>
> > OTOH if it doesn't cause down-clocking as well then it won't be slower.
>
> Yes, as far as I know that's not an issue on AMD processors, even Zen 4.
> The "avoid AVX-512 due to downclocking" rule is historical guidance for
> Intel processors that had a bad implementation of AVX-512. There's no
> reason to exclude Zen 4 from executing AVX-512 optimized code. At worst
> it will just be the same, as we're seeing here.

It does not cause down clocking. But for some of the more complicated
code I've seen AVX512 being significantly slower than AVX2 on these.
So we need to watch out and not automatically assume AVX512 is faster.