Re: [RFC PATCH 0/4] crypto: add CRYPTO_TFM_REQ_DMA flag

From: Ard Biesheuvel
Date: Tue Dec 08 2020 - 02:44:37 EST


On Mon, 7 Dec 2020 at 14:50, Horia Geantă <horia.geanta@xxxxxxx> wrote:
>
> On 11/26/2020 9:09 AM, Ard Biesheuvel wrote:
> > On Wed, 25 Nov 2020 at 22:39, Iuliana Prodan <iuliana.prodan@xxxxxxx> wrote:
> >>
> >> On 11/25/2020 11:16 PM, Ard Biesheuvel wrote:
> >>> On Wed, 25 Nov 2020 at 22:14, Iuliana Prodan (OSS)
> >>> <iuliana.prodan@xxxxxxxxxxx> wrote:
> >>>>
> >>>> From: Iuliana Prodan <iuliana.prodan@xxxxxxx>
> >>>>
> >>>> Add the option to allocate the crypto request object plus any extra space
> >>>> needed by the driver into a DMA-able memory.
> >>>>
> >>>> Add CRYPTO_TFM_REQ_DMA flag to be used by backend implementations to
> >>>> indicate to crypto API the need to allocate GFP_DMA memory
> >>>> for private contexts of the crypto requests.
> >>>>
> >>>
> >>> These are always directional DMA mappings, right? So why can't we use
> >>> bounce buffering here?
> >>>
> >> The idea was to avoid allocating any memory in crypto drivers.
> >> We want to be able to use dm-crypt with CAAM, which needs DMA-able
> >> memory and increasing reqsize is not enough.
> >
> > But what does 'needs DMA-able memory' mean? DMA operations are
> > asynchronous by definition, and so the DMA layer should be able to
> > allocate bounce buffers when needed. This will cost some performance
> > in cases where the hardware cannot address all of memory directly, but
> > this is a consequence of the design, and I don't think we should
> > burden the generic API with this.
> >
> The performance loss due to bounce buffering is non-negligible.
> Previous experiments we did showed a 35% gain (when forcing all data,
> including I/O buffers, in ZONE_DMA32).
>
> I don't have the exact numbers for bounce buffering introduced by allowing
> only by the control data structures (descriptors etc.) in high memory,
> but I don't think it's fair to easily dismiss this topic,
> given the big performance drop and relatively low impact
> on the generic API.
>

It is not about the impact on the API. It is about the layering
violation: all masters in a system will be affected by DMA addressing
limitations, and all will be affected by the performance impact of
bounce buffering when it is needed. DMA accessible memory is generally
'owned' by the DMA layer so it can be used for bounce buffering for
all masters. If one device starts claiming DMA-able memory for its own
use, other masters could be adversely affected, given that they may
not be able to do DMA at all (not even via bounce buffers) once a
single master uses up all DMA-able memory.