Re: [PATCH v3 2/3] lib/crypto: x86/sha1: PHE Extensions optimized SHA1 transform function

From: Eric Biggers

Date: Sat Jan 17 2026 - 19:31:28 EST


On Fri, Jan 16, 2026 at 03:15:12PM +0800, AlanSong-oc wrote:
> Zhaoxin CPUs have implemented the SHA(Secure Hash Algorithm) as its CPU
> instructions by PHE(Padlock Hash Engine) Extensions, including XSHA1,
> XSHA256, XSHA384 and XSHA512 instructions.
>
> With the help of implementation of SHA in hardware instead of software,
> can develop applications with higher performance, more security and more
> flexibility.
>
> This patch includes the XSHA1 instruction optimized implementation of
> SHA-1 transform function.
>
> Signed-off-by: AlanSong-oc <AlanSong-oc@xxxxxxxxxxx>

Please include the information I've asked for (benchmark results, test
results, and link to the specification) directly in the commit message.

> +#if IS_ENABLED(CONFIG_CPU_SUP_ZHAOXIN)
> +#define PHE_ALIGNMENT 16
> +static void sha1_blocks_phe(struct sha1_block_state *state,
> + const u8 *data, size_t nblocks)

The IS_ENABLED(CONFIG_CPU_SUP_ZHAOXIN) should go in the CPU feature
check, so that the code will be parsed regardless of the setting. That
reduces the chance that future changes will cause compilation errors.

> + /*
> + * XSHA1 requires %edi to point to a 32-byte, 16-byte-aligned
> + * buffer on Zhaoxin processors.
> + */

This seems implausible. In 64-bit mode a pointer can't fit in %edi. I
thought you mentioned that this instruction is 64-bit compatible? You
may have meant %rdi.

Interestingly, the spec you provided specifically says the registers
operated on are %eax, %ecx, %esi, and %edi.

So assuming the code works, perhaps both the spec and your code comment
are incorrect?

These errors don't really confidence in this instruction.

> + memcpy(dst, state, SHA1_DIGEST_SIZE);
> + asm volatile(".byte 0xf3,0x0f,0xa6,0xc8"
> + : "+S"(data), "+D"(dst)
> + : "a"((long)-1), "c"(nblocks));
> + memcpy(state, dst, SHA1_DIGEST_SIZE);

Is the reason for using '.byte' that the GNU and clang assemblers don't
implement the mnemonic this Zhaoxin-specific instruction? The spec
implies that the intended mnemonic is "rep sha1".

If that's correct, could you add a comment like /* rep sha1 */ so that
it's clear what the intended instruction is?

Also, the spec describes all four registers as both input and output
registers. Yet your inline asm marks %rax and %rcx as inputs only.

> @@ -59,6 +79,11 @@ static void sha1_mod_init_arch(void)
> {
> if (boot_cpu_has(X86_FEATURE_SHA_NI)) {
> static_call_update(sha1_blocks_x86, sha1_blocks_ni);
> +#if IS_ENABLED(CONFIG_CPU_SUP_ZHAOXIN)
> + } else if (boot_cpu_has(X86_FEATURE_PHE_EN)) {
> + if (boot_cpu_data.x86 >= 0x07)
> + static_call_update(sha1_blocks_x86, sha1_blocks_phe);
> +#endif

I think it should be:

} else if (IS_ENABLED(CONFIG_CPU_SUP_ZHAOXIN) &&
boot_cpu_has(X86_FEATURE_PHE_EN) &&
boot_cpu_data.x86 >= 0x07) {
static_call_update(sha1_blocks_x86, sha1_blocks_phe);

... so (a) the code will be parsed even when !CONFIG_CPU_SUP_ZHAOXIN,
and (b) functions won't be unnecessarily disabled when
boot_cpu_has(X86_FEATURE_PHE_EN) && boot_cpu_data.x86 < 0x07).

As before, all these comments apply to the SHA-256 patch too.

- Eric