Re: [PATCH] crypto: gcm - fix cacheline sharing

From: Ard Biesheuvel
Date: Thu May 30 2019 - 11:16:54 EST


On Thu, 30 May 2019 at 16:53, Pascal Van Leeuwen
<pvanleeuwen@xxxxxxxxxxxxxxxx> wrote:
>
> >
> > This might work:
> >
> > diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
> > index c0ece44f303b..3d313d2a279a 100644
> > --- a/drivers/crypto/caam/caamalg.c
> > +++ b/drivers/crypto/caam/caamalg.c
> > @@ -1661,7 +1661,8 @@ static int aead_decrypt(struct aead_request *req)
> > * allocate and map the skcipher extended descriptor for skcipher
> > */
> > static struct skcipher_edesc *skcipher_edesc_alloc(struct
> > skcipher_request *req,
> > - int desc_bytes)
> > + int desc_bytes,
> > + u8 const *input_iv)
> > {
> > struct crypto_skcipher *skcipher = crypto_skcipher_reqtfm(req);
> > struct caam_ctx *ctx = crypto_skcipher_ctx(skcipher);
> > @@ -1745,7 +1746,7 @@ static struct skcipher_edesc
> > *skcipher_edesc_alloc(struct skcipher_request *req,
> > /* Make sure IV is located in a DMAable area */
> > if (ivsize) {
> > iv = (u8 *)edesc->hw_desc + desc_bytes + sec4_sg_bytes;
> > - memcpy(iv, req->iv, ivsize);
> > + memcpy(iv, input_iv, ivsize);
> >
> > iv_dma = dma_map_single(jrdev, iv, ivsize, DMA_TO_DEVICE);
> > if (dma_mapping_error(jrdev, iv_dma)) {
> > @@ -1801,7 +1802,8 @@ static int skcipher_encrypt(struct skcipher_request
> > *req)
> > int ret = 0;
> >
> > /* allocate extended descriptor */
> > - edesc = skcipher_edesc_alloc(req, DESC_JOB_IO_LEN * CAAM_CMD_SZ);
> > + edesc = skcipher_edesc_alloc(req, DESC_JOB_IO_LEN * CAAM_CMD_SZ,
> > + req->iv);
> > if (IS_ERR(edesc))
> > return PTR_ERR(edesc);
> >
> > @@ -1832,13 +1834,11 @@ static int skcipher_decrypt(struct
> > skcipher_request *req)
> > struct caam_ctx *ctx = crypto_skcipher_ctx(skcipher);
> > int ivsize = crypto_skcipher_ivsize(skcipher);
> > struct device *jrdev = ctx->jrdev;
> > + u8 in_iv[AES_BLOCK_SIZE];
> > u32 *desc;
> > int ret = 0;
> >
> > - /* allocate extended descriptor */
> > - edesc = skcipher_edesc_alloc(req, DESC_JOB_IO_LEN * CAAM_CMD_SZ);
> > - if (IS_ERR(edesc))
> > - return PTR_ERR(edesc);
> > + memcpy(in_iv, req->iv, ivsize);
> >
> > /*
> > * The crypto API expects us to set the IV (req->iv) to the last
> > @@ -1848,6 +1848,11 @@ static int skcipher_decrypt(struct skcipher_request
> > *req)
> > scatterwalk_map_and_copy(req->iv, req->src, req->cryptlen
> > -
> > ivsize, ivsize, 0);
> >
> > + /* allocate extended descriptor */
> > + edesc = skcipher_edesc_alloc(req, DESC_JOB_IO_LEN * CAAM_CMD_SZ,
> > in_iv);
> > + if (IS_ERR(edesc))
> > + return PTR_ERR(edesc);
> > +
> > /* Create and submit job descriptor*/
> > init_skcipher_job(req, edesc, false);
> > desc = edesc->hw_desc;
>
> Interesting. This discussion made me reevaluate my own implementation.
>
> First thing I realised is that I *am* currently doing the IV copying with
> the data buffer mapped for DMA ... If I understand correctly that may be a
> bad idea on some systems? I which case I will need to do my copy earlier.
>

Yes, this may break on non-coherent systems, since the DMA layer does
not expect cachelines covering the mapped region to enter the dirty
state while the mapping is active. DMA unmap does a clean+invalidate
to make the data written to main memory by the device visible to the
CPU, and it does not expect the 'clean' part of that operation to
result in any writebacks. If such a writeback does occur, it does so
at cacheline granularity, therefore potentially overwriting some data
that was just written by the device.

> Second thing is the above implementation made me realise I don't need to
> buffer the IV as part of the skcipher_request_ctx, I can just copy the
> original req->iv to some local variable, pass a pointer to that along and
> then overwrite req->iv directly. More elegant and hopefully also more
> efficient ...
>