Re: [PATCH net-next v5 03/20] zinc: ChaCha20 generic C implementation and selftest

From: Jason A. Donenfeld
Date: Tue Sep 18 2018 - 22:03:09 EST


On Wed, Sep 19, 2018 at 3:08 AM Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> Does this consistently perform as well as an implementation that organizes the
> operations such that the quarterrounds for all columns/diagonals are
> interleaved? As-is, there are tight dependencies in QUARTER_ROUND() (as well as
> in the existing chacha20_block() in lib/chacha20.c, for that matter), so we're
> heavily depending on the compiler to do the needed interleaving so as to not get
> potentially disastrous performance. Making it explicit could be a good idea.

It does perform as well, and the compiler outputs good code, even on
older compilers. Notably that's all a single statement (via the comma
operator).

> > +}
> > +
> > +static void chacha20_generic(u8 *out, const u8 *in, u32 len, const u32 key[8],
> > + const u32 counter[4])
> > +{
> > + __le32 buf[CHACHA20_BLOCK_WORDS];
> > + u32 x[] = {
> > + EXPAND_32_BYTE_K,
> > + key[0], key[1], key[2], key[3],
> > + key[4], key[5], key[6], key[7],
> > + counter[0], counter[1], counter[2], counter[3]
> > + };
> > +
> > + if (out != in)
> > + memmove(out, in, len);
> > +
> > + while (len >= CHACHA20_BLOCK_SIZE) {
> > + chacha20_block_generic(buf, x);
> > + crypto_xor(out, (u8 *)buf, CHACHA20_BLOCK_SIZE);
> > + len -= CHACHA20_BLOCK_SIZE;
> > + out += CHACHA20_BLOCK_SIZE;
> > + }
> > + if (len) {
> > + chacha20_block_generic(buf, x);
> > + crypto_xor(out, (u8 *)buf, len);
> > + }
> > +}
>
> If crypto_xor_cpy() is used instead of crypto_xor(), and 'in' is incremented
> along with 'out', then the memmove() is not needed.

Nice idea, thanks. Implemented.

Jason