From: Arnd Bergmann
Date: Wed Apr 06 2022 - 07:01:12 EST

On Wed, Apr 6, 2022 at 1:59 AM Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> wrote:
> On Tue, Apr 05, 2022 at 02:57:49PM +0100, Catalin Marinas wrote:
> > In preparation for supporting a dynamic kmalloc() minimum alignment,
> > allow architectures to define ARCH_KMALLOC_MINALIGN independently of
> > ARCH_DMA_MINALIGN. In addition, always define ARCH_DMA_MINALIGN even if
> > an architecture does not override it.
> >
> [ +Cc slab maintainer/reviewers ]
> I get why you want to set minimum alignment of kmalloc() dynamically.
> That's because cache line size can be different and we cannot statically
> know that, right?
> But I don't get why you are trying to decouple ARCH_KMALLOC_MINALIGN
> from ARCH_DMA_MINALIGN. kmalloc'ed buffer is always supposed to be DMA-safe.
> I'm afraid this series may break some archs/drivers.
> in Documentation/dma-api-howto.rst:
> >
> > Architectures must ensure that kmalloc'ed buffer is
> > DMA-safe. Drivers and subsystems depend on it. If an architecture
> > isn't fully DMA-coherent (i.e. hardware doesn't ensure that data in
> > the CPU cache is identical to data in main memory),
> > ARCH_DMA_MINALIGN must be set so that the memory allocator
> > makes sure that kmalloc'ed buffer doesn't share a cache line with
> > the others. See arch/arm/include/asm/cache.h as an example.
> >
> > Note that ARCH_DMA_MINALIGN is about DMA memory alignment
> > constraints. You don't need to worry about the architecture data
> > alignment constraints (e.g. the alignment constraints about 64-bit
> > objects).
> If I'm missing something, please let me know :)

It helps in two ways:

- you can start with a relatively large hardcoded ARCH_DMA_MINALIGN
of 128 or 256 bytes, depending on what the largest possible line size
is for any machine you want to support, and then drop that down to
32 or 64 bytes based on runtime detection. This should always be safe,
and it means a very sizable chunk of wasted memory can be recovered.

- On systems that are fully cache coherent, there is no need to align
kmallloc() allocations for DMA safety at all, on these, we can drop the
size even below the cache line. This does not apply on most of the
cheaper embedded or mobile SoCs, but it helps a lot on the machines
you'd find in a data center.