Re: [PATCH v3 2/3] lib/crypto: x86/sha1: PHE Extensions optimized SHA1 transform function
From: Eric Biggers
Date: Thu Mar 05 2026 - 14:18:58 EST
On Thu, Mar 05, 2026 at 09:37:01AM +0800, AlanSong-oc wrote:
> > Also, the spec describes all four registers as both input and output
> > registers. Yet your inline asm marks %rax and %rcx as inputs only.
>
> Thank you for pointing this question out.
>
> On the one hand, when the '+' constraint modifier is applied to an
> operand, it is treated as both an input and an output operand.
> Therefore, %rsi and %rdi are considered input operands as well.
>
> On the other hand, after the instruction executes, the values in %rax,
> %rsi, and %rcx are modified. These registers should therefore use the
> '+' constraint modifier to inform the compiler that their values are
> updated by the assembly code. We cannot rely on clobbers to indicate
> that the values of input operands are modified following the suggestion
> by gcc manual. However, since %rax is initialized with a constant value,
> it does not need the '+' constraint modifier. It should can simply be
> specified as an input operand.
>
> In addition, although %rdi itself is not modified by the instruction but
> the memory it references may be updated, a "memory" clobber should be
> added to notify the compiler about possible memory side effects.
>
> The corrected inline assembly should be written as follows:
>
> asm volatile(".byte 0xf3,0x0f,0xa6,0xc8" /* REP XSHA1 */
> : "+S"(data), "+c"(nblocks)
> : "a"((long)-1), "D"(dst)
> : "memory");
If the instruction both reads and writes %rax, then the constraint needs
to be "+a", even if the C code doesn't use the updated value. Otherwise
the compiler can assume that the value stored in %rax is unchanged and
optimize the code accordingly, for example by not reinitializing %rax if
the constant -1 is needed again later on.
Yes, this means you'll need to move the constant -1 to a local variable.
> > As before, all these comments apply to the SHA-256 patch too.
>
> Surely, I will also apply all of the suggestions mentioned above to the
> SHA-256 patch.
I also have to ask: are you sure you need SHA-1 to be optimized at all?
SHA-1 has been deprecated for a long time. Most users have moved to
SHA-256 and other stronger algorithms, and those that haven't need to
move very soon. There's little value in adding new optimized code for
SHA-1.
How about simplifying your patch to just SHA-256? Then we can focus on
the one that's actually important and not on the deprecated SHA-1.
- Eric