Re: [PATCH v2] x86/crc32: use builtins to improve code generation

From: Eric Biggers
Date: Fri Feb 28 2025 - 16:20:57 EST


On Thu, Feb 27, 2025 at 03:47:03PM -0800, Bill Wendling wrote:
> For both gcc and clang, crc32 builtins generate better code than the
> inline asm. GCC improves, removing unneeded "mov" instructions. Clang
> does the same and unrolls the loops. GCC has no changes on i386, but
> Clang's code generation is vastly improved, due to Clang's "rm"
> constraint issue.
>
> The number of cycles improved by ~0.1% for GCC and ~1% for Clang, which
> is expected because of the "rm" issue. However, Clang's performance is
> better than GCC's by ~1.5%, most likely due to loop unrolling.

Also note that the patch
https://lore.kernel.org/r/20250210210741.471725-1-ebiggers@xxxxxxxxxx/ (which is
already enqueued in the crc tree for 6.15) changes "rm" to "r" when the compiler
is clang, to improve clang's code generation. The numbers you quote are against
the original version, right?

- Eric