Re: [PATCH v2 4/6] crypto: x86/chacha20 - add XChaCha20 support

From: Eric Biggers
Date: Wed Dec 05 2018 - 01:11:00 EST


Hi Martin,

On Sat, Dec 01, 2018 at 05:40:40PM +0100, Martin Willi wrote:
>
> > An SSSE3 implementation of single-block HChaCha20 is also added so
> > that XChaCha20 can use it rather than the generic
> > implementation. This required refactoring the ChaCha permutation
> > into its own function.
>
> > [...]
>
> > +ENTRY(chacha20_block_xor_ssse3)
> > + # %rdi: Input state matrix, s
> > + # %rsi: up to 1 data block output, o
> > + # %rdx: up to 1 data block input, i
> > + # %rcx: input/output length in bytes
> > +
> > + # x0..3 = s0..3
> > + movdqa 0x00(%rdi),%xmm0
> > + movdqa 0x10(%rdi),%xmm1
> > + movdqa 0x20(%rdi),%xmm2
> > + movdqa 0x30(%rdi),%xmm3
> > + movdqa %xmm0,%xmm8
> > + movdqa %xmm1,%xmm9
> > + movdqa %xmm2,%xmm10
> > + movdqa %xmm3,%xmm11
> > +
> > + mov %rcx,%rax
> > + call chacha20_permute
> > +
> > # o0 = i0 ^ (x0 + s0)
> > paddd %xmm8,%xmm0
> > cmp $0x10,%rax
> > @@ -189,6 +198,23 @@ ENTRY(chacha20_block_xor_ssse3)
> >
> > ENDPROC(chacha20_block_xor_ssse3)
> >
> > +ENTRY(hchacha20_block_ssse3)
> > + # %rdi: Input state matrix, s
> > + # %rsi: output (8 32-bit words)
> > +
> > + movdqa 0x00(%rdi),%xmm0
> > + movdqa 0x10(%rdi),%xmm1
> > + movdqa 0x20(%rdi),%xmm2
> > + movdqa 0x30(%rdi),%xmm3
> > +
> > + call chacha20_permute
>
> AFAIK, the general convention is to create proper stack frames using
> FRAME_BEGIN/END for non leaf-functions. Should chacha20_permute()
> callers do so?
>

Yes, I'll do that. (Ard suggested similarly in the arm64 version too.)

- Eric