Re: [PATCH v3 2/3] lib/crypto: x86/sha1: PHE Extensions optimized SHA1 transform function

From: AlanSong-oc

Date: Wed Mar 11 2026 - 07:38:08 EST


On 3/6/26 03:18, Eric Biggers wrote:
> On Thu, Mar 05, 2026 at 09:37:01AM +0800, AlanSong-oc wrote:
>>> Also, the spec describes all four registers as both input and output
>>> registers. Yet your inline asm marks %rax and %rcx as inputs only.
>>
>> Thank you for pointing this question out.
>>
>> On the one hand, when the '+' constraint modifier is applied to an
>> operand, it is treated as both an input and an output operand.
>> Therefore, %rsi and %rdi are considered input operands as well.
>>
>> On the other hand, after the instruction executes, the values in %rax,
>> %rsi, and %rcx are modified. These registers should therefore use the
>> '+' constraint modifier to inform the compiler that their values are
>> updated by the assembly code. We cannot rely on clobbers to indicate
>> that the values of input operands are modified following the suggestion
>> by gcc manual. However, since %rax is initialized with a constant value,
>> it does not need the '+' constraint modifier. It should can simply be
>> specified as an input operand.
>>
>> In addition, although %rdi itself is not modified by the instruction but
>> the memory it references may be updated, a "memory" clobber should be
>> added to notify the compiler about possible memory side effects.
>>
>> The corrected inline assembly should be written as follows:
>>
>> asm volatile(".byte 0xf3,0x0f,0xa6,0xc8" /* REP XSHA1 */
>> : "+S"(data), "+c"(nblocks)
>> : "a"((long)-1), "D"(dst)
>> : "memory");
>
> If the instruction both reads and writes %rax, then the constraint needs
> to be "+a", even if the C code doesn't use the updated value. Otherwise
> the compiler can assume that the value stored in %rax is unchanged and
> optimize the code accordingly, for example by not reinitializing %rax if
> the constant -1 is needed again later on.
>
> Yes, this means you'll need to move the constant -1 to a local variable.

Indeed, to ensure that the compiler generates correct binary code under
all optimization levels, the inline assembly should accurately describe
the behavior of the instruction. I will use a local variable initialized
to -1 for the instruction reads and writes %rax register in the next
version of the patch.

>>> As before, all these comments apply to the SHA-256 patch too.
>>
>> Surely, I will also apply all of the suggestions mentioned above to the
>> SHA-256 patch.
>
> I also have to ask: are you sure you need SHA-1 to be optimized at all?
> SHA-1 has been deprecated for a long time. Most users have moved to
> SHA-256 and other stronger algorithms, and those that haven't need to
> move very soon. There's little value in adding new optimized code for
> SHA-1.
>
> How about simplifying your patch to just SHA-256? Then we can focus on
> the one that's actually important and not on the deprecated SHA-1.

It is true that SHA-1 is rarely used by most users today. However, it
may still be needed in certain scenarios. For those cases, we would like
to add support for the XHSA1 instruction to accelerate SHA-1.

Does the crypto community have any plans to remove SHA-1 support in
recent kernel versions?

Best Regards
AlanSong-oc