Re: Why the auxiliary cipher in gss_krb5_crypto.c?

From: Ard Biesheuvel
Date: Fri Dec 04 2020 - 12:07:25 EST


On Fri, 4 Dec 2020 at 17:52, David Howells <dhowells@xxxxxxxxxx> wrote:
>
> Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
>
> > OK, I guess I don't understand the question. I haven't thought about
> > this code in at least a decade. What's an auxilary cipher? Is this a
> > question about why we're implementing something, or how we're
> > implementing it?
>
> That's what the Linux sunrpc implementation calls them:
>
> struct crypto_sync_skcipher *acceptor_enc;
> struct crypto_sync_skcipher *initiator_enc;
> struct crypto_sync_skcipher *acceptor_enc_aux;
> struct crypto_sync_skcipher *initiator_enc_aux;
>
> Auxiliary ciphers aren't mentioned in rfc396{1,2} so it appears to be
> something peculiar to that implementation.
>
> So acceptor_enc and acceptor_enc_aux, for instance, are both based on the same
> key, and the implementation seems to pass the IV from one to the other. The
> only difference is that the 'aux' cipher lacks the CTS wrapping - which only
> makes a difference for the final two blocks[*] of the encryption (or
> decryption) - and only if the data doesn't fully fill out the last block
> (ie. it needs padding in some way so that the encryption algorithm can handle
> it).
>
> [*] Encryption cipher blocks, that is.
>
> So I think it's purpose is twofold:
>
> (1) It's a way to be a bit more efficient, cutting out the CTS layer's
> indirection and additional buffering.
>
> (2) crypto_skcipher_encrypt() assumes that it's doing the entire crypto
> operation in one go and will always impose the final CTS bit, so you
> can't call it repeatedly to progress through a buffer (as
> xdr_process_buf() would like to do) as that would corrupt the data being
> encrypted - unless you made sure that the data was always block-size
> aligned (in which case, there's no point using CTS).
>
> I wonder how much going through three layers of crypto modules costs. Looking
> at how AES can be implemented using, say, Intel AES intructions, it looks like
> AES+CBC should be easy to do in a single module. I wonder if we could have
> optimised kerberos crypto that do the AES and the SHA together in a single
> loop.
>

The tricky thing with CTS is that you have to ensure that the final
full and partial blocks are presented to the crypto driver as one
chunk, or it won't be able to perform the ciphertext stealing. This
might be the reason for the current approach. If the sunrpc code has
multiple disjoint chunks of data to encrypto, it is always better to
wrap it in a single scatterlist and call into the skcipher only once.

However, I would recommend against it: at least for ARM and arm64, I
have already contributed SIMD based implementations that use SIMD
permutation instructions and overlapping loads and stores to perform
the ciphertext stealing, which means that there is only a single layer
which implements CTS+CBC+AES, and this layer can consume the entire
scatterlist in one go. We could easily do something similar in the
AES-NI driver as well.