Re: [PATCH] crypto: gcm - fix cacheline sharing

From: Ard Biesheuvel
Date: Thu May 30 2019 - 09:29:38 EST


On Thu, 30 May 2019 at 15:18, Horia Geanta <horia.geanta@xxxxxxx> wrote:
>
> On 5/30/2019 11:08 AM, Ard Biesheuvel wrote:
> > On Thu, 30 May 2019 at 09:46, Horia Geanta <horia.geanta@xxxxxxx> wrote:
> >>
> >> On 5/30/2019 8:34 AM, Herbert Xu wrote:
> >>> On Wed, May 29, 2019 at 01:27:28PM -0700, Eric Biggers wrote:
> >>>>
> >>>> So what about the other places that also pass an IV located next to the data,
> >>>> like crypto/ccm.c and crypto/adiantum.c? If we're actually going to make this a
> >> Fix for ccm is WIP.
> >> We were not aware of adiantum since our crypto engine does not accelerate it.
> >>
> >>>> new API requirement, then we need to add a debugging option that makes the API
> >>>> detect this violation so that the other places can be fixed too.
> >>>>
> >> IMO this is not a new crypto API requirement.
> >> crypto API and its users must follow DMA API rules, besides crypto-specific ones.
> >>
> >> In this particular case, crypto/gcm.c is both an implementation and a crypto API
> >> user, since it uses underneath ctr(aes) (and ghash).
> >> Currently generic gcm implementation is breaking DMA API, since part of the dst
> >> buffer (auth_tag) provided to ctr(aes) is sharing a cache line with some other
> >> data structure (iv).
> >>
> >> The DMA API rule is mentioned in Documentation/DMA-API.txt
> >>
> >> .. warning::
> >>
> >> Memory coherency operates at a granularity called the cache
> >> line width. In order for memory mapped by this API to operate
> >> correctly, the mapped region must begin exactly on a cache line
> >> boundary and end exactly on one (to prevent two separately mapped
> >> regions from sharing a single cache line).
> >>
> >>
> >
> > This is overly restrictive, and not in line with reality. The whole
> > networking stack operates on buffers shifted by 2 bytes if
> > NET_IP_ALIGN is left at its default value of 2. There are numerous
> > examples in other places as well.
> >
> > Given that kmalloc() will take the cacheline granularity into account
> > if necessary, the only way this issue can hit is when a single kmalloc
> > buffer is written to by two different masters.
> >
> I guess there are only two options:
> -either cache line sharing is avoided OR
> -users need to be *aware* they are sharing the cache line and some rules /
> assumptions are in place on how to safely work on the data
>
> What you are probably saying is that 2nd option is sometimes the way to go.
>
> >>>> Also, did you consider whether there's any way to make the crypto API handle
> >>>> this automatically, so that all the individual users don't have to?
> >> That would probably work, but I guess it would come up with a big overhead.
> >>
> >> I am thinking crypto API would have to check each buffer used by src, dst
> >> scatterlists is correctly aligned - starting and ending on cache line boundaries.
> >>
> >>>
> >>> You're absolutely right Eric.
> >>>
> >>> What I suggested in the old thread is non-sense. While you can
> >>> force GCM to provide the right pointers you cannot force all the
> >>> other crypto API users to do this.
> >>>
> >> Whose problem is that crypto API users don't follow the DMA API requirements?
> >>
> >>> It would appear that Ard's latest suggestion should fix the problem
> >>> and is the correct approach.
> >>>
> >> I disagree.
> >> We shouldn't force crypto implementations to be aware of such inconsistencies in
> >> the I/O data buffers (pointed to by src/dst scatterlists) that are supposed to
> >> be safely DMA mapped.
> >>
> >
> > I'm on the fence here. On the one hand, it is slightly dodgy for the
> > GCM driver to pass a scatterlist referencing a buffer that shares a
> > cacheline with another buffer passed by an ordinary pointer, and for
> > which an explicit requirement exists that the callee should update it
> > before returning.
> >
> > On the other hand, I think it is reasonable to require drivers not to
> > perform such updates while the scatterlist is mapped for DMA, since
> > fixing it in the callers puts a disproportionate burden on them, given
> > that non-coherent DMA only represents a small minority of use cases.
> >
> The problem with this approach is that the buffers in the scatterlist could
> hypothetically share cache lines with *any* other CPU-updated data, not just the
> IV in the crypto request (as it happens here).
> How could a non-coherent DMA implementation cope with this?
>

I don't think the situation is that bad. We only support allocations
in the linear map for DMA/scatterlists, and buffers in the linear map
can only share a cacheline if they were allocated using the same
kmalloc() invocation (on systems that support non-coherent DMA)

That does mean that it is actually more straightforward to deal with
this at the level where the allocation occurs, and that would justify
dealing with it in the callers.

However, from the callee's point of view, it simply means that you
should not dereference any request struct pointers for writing while
the 'dst' scatterlist is mapped for DMA.

As I said, I am on the fence. I think there are arguments for both sides ...