Re: [PATCH v2 4/6] crypto: x86/chacha20 - add XChaCha20 support

From: Martin Willi
Date: Sat Dec 01 2018 - 11:40:49 EST



> An SSSE3 implementation of single-block HChaCha20 is also added so
> that XChaCha20 can use it rather than the generic
> implementation. This required refactoring the ChaCha permutation
> into its own function.

> [...]

> +ENTRY(chacha20_block_xor_ssse3)
> + # %rdi: Input state matrix, s
> + # %rsi: up to 1 data block output, o
> + # %rdx: up to 1 data block input, i
> + # %rcx: input/output length in bytes
> +
> + # x0..3 = s0..3
> + movdqa 0x00(%rdi),%xmm0
> + movdqa 0x10(%rdi),%xmm1
> + movdqa 0x20(%rdi),%xmm2
> + movdqa 0x30(%rdi),%xmm3
> + movdqa %xmm0,%xmm8
> + movdqa %xmm1,%xmm9
> + movdqa %xmm2,%xmm10
> + movdqa %xmm3,%xmm11
> +
> + mov %rcx,%rax
> + call chacha20_permute
> +
> # o0 = i0 ^ (x0 + s0)
> paddd %xmm8,%xmm0
> cmp $0x10,%rax
> @@ -189,6 +198,23 @@ ENTRY(chacha20_block_xor_ssse3)
>
> ENDPROC(chacha20_block_xor_ssse3)
>
> +ENTRY(hchacha20_block_ssse3)
> + # %rdi: Input state matrix, s
> + # %rsi: output (8 32-bit words)
> +
> + movdqa 0x00(%rdi),%xmm0
> + movdqa 0x10(%rdi),%xmm1
> + movdqa 0x20(%rdi),%xmm2
> + movdqa 0x30(%rdi),%xmm3
> +
> + call chacha20_permute

AFAIK, the general convention is to create proper stack frames using
FRAME_BEGIN/END for non leaf-functions. Should chacha20_permute()
callers do so?

For the other parts:

Reviewed-by: Martin Willi <martin@xxxxxxxxxxxxxx>