Re: [PATCH 07/10] crypto: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN
From: Linus Torvalds
Date: Wed Apr 13 2022 - 15:54:34 EST
On Tue, Apr 12, 2022 at 10:47 PM Catalin Marinas
<catalin.marinas@xxxxxxx> wrote:
>
> I agree. There is also an implicit expectation that the DMA API works on
> kmalloc'ed buffers and that's what ARCH_DMA_MINALIGN is for (and the
> dynamic arch_kmalloc_minalign() in this series). But the key point is
> that the driver doesn't need to know the CPU cache topology, coherency,
> the DMA API and kmalloc() take care of these.
Honestly, I think it would probably be worth discussing the "kmalloc
DMA alignment" issues.
99.9% of kmalloc users don't want to do DMA.
And there's actually a fair amount of small kmalloc for random stuff.
Right now on my laptop, I have
kmalloc-8 16907 18432 8 512 1 : ...
according to slabinfo, so almost 17 _thousand_ allocations of 8 bytes.
It's all kinds of sad if those allocations need to be 64 bytes in size
just because of some silly DMA alignment issue, when none of them want
it.
Yeah, yeah, wasting a megabyte of memory is "just a megabyte" these
days. Which is crazy. It's literally memory that could have been used
for something much more useful than just pure and utter waste.
I think we could and should just say "people who actually require DMA
accesses should say so at kmalloc time". We literally have that
GFP_DMA and ZOME_DMA for various historical reasons, so we've been
able to do that before.
No, that historical GFP_DMA isn't what arm64 wants - it's the old
crazy "legacy 16MB DMA" thing that ISA DMA used to have.
But the basic issue was true then, and is true now - DMA allocations
are fairly special, and should not be that hard to just mark as such.
We could add a trivial wrapper function like
static void *dma_kmalloc(size_t size)
{ return kmalloc(size | (ARCH_DMA_MINALIGN-1); }
which now means that the size argument is guaranteed to be big enough
(not not overflow to zero) that you get that aligned memory
allocation.
We could perhaps even have other special rules. Including really
specific ones, like saying
- allocations smaller than 32 bytes are not DMA coherent, because we pack them
which would allow those small allocations to not pointlessly waste memory.
I dunno. But it's ridiculous that arm64 wastes so much memory when
it's approximately never needed.
Linus