Re: [PATCH v2] dma-contiguous: setup default pernuma cma area if not configured explicitly

From: Feng Tang

Date: Fri Apr 24 2026 - 02:41:38 EST


On Thu, Apr 23, 2026 at 02:51:06PM +0100, Robin Murphy wrote:
> On 23/04/2026 10:52 am, Feng Tang wrote:
> > There was a report on a multi-numa-nodes ARM server that when IOMMU is
> > disabled, the dma_alloc_coherent() function always returns memory from
> > node 0 even for devices attaching to other nodes, while they can get
> > local dma memory when IOMMU is on with the same API.
> >
> > The reason is, when IOMMU is disabled, the dma_alloc_coherent() will
> > go the direct way and call dma_alloc_contiguous(). The system doesn't
> > have any explicit cma setting (like per-numa cma), and only has a
> > default 64MB cma reserved area (on node 0), where kernel will try
> > first to allocate memory from.
> >
> > Robin Murphy suggested to setup pernuma cma or disable cma, which did
> > solve the issue. While there is still concern that for customers
> > which don't have much kernel knowledge, they could still suffer from
> > this silently as some architectures enable cma area by default (not
> > an issue for X86 though, which set CONFIG_CMA_SIZE_MBYTES to 0 by
> > default) for most Linux distributions.
> >
> > One thought is to follow the current cma reserving policy for platform
> > with 'CONFIG_DMA_NUMA_CMA=y', that if the numa cma is not explicitly
> > configured, set it up according to CONFIG_CMA_SIZE_MBYTES (The
> > percentage kernel option is not considered yet as the number of NUMA
> > nodes could be big).
>
> IIRC, the main reason it's still an opt-in is that it doesn't necessarily
> interact all that well with the default CMA area, and what we definitely
> don't want to do is unexpectedly start allocating additional CMA areas on
> systems which don't need nor want them.

I see the point, thanks

>
> I guess what might be ideal would be to rearrange things such that when
> DMA_NUMA_CMA is enabled, we try to merge the "global" CMA area with a
> per-node area that can satisfy the necessary size, ZONE_DMA32, etc.
> constraints, such that unless explicitly configured otherwise, we don't end
> up making that extra n+1th allocation. Similarly I'm not sure about the
> usefulness of having two separate types of per-node area, especially given
> the apparent intent that users only want one _or_ the other, so probably
> dma_contiguous_numa_area[] should really have just been a generalisation of
> dma_contiguous_pernuma_area[] in the first place...

Yes, that makes sense. I did wonder what happens when both of them are
configured in the cmdline. The 2 numa dma array should better be merged,
maybe giving 'numa_cma' a higher priority. I know a user case for
'numa_cma' is that some socket in system have GPU/GPGPU card for AI
connected and need huge cma area specifically for it.


> I can imagine that work being quite involved though, as all the interaction
> between default, command line and devicetree/platform controls is sure to be
> fiddly. As a compromise for now, I think rather than trying to imply a
> default "cma_pernuma" behaviour, it would be cleaner to instead imply
> "numa_cma" to only replicate the default area across nodes other than the
> one which has it already. That then would inherently avoid changing anything
> for single-node systems; otherwise at the very least any automatic fiddling
> with pernuma_size_bytes should depend on num_online_nodes() > 1.

Something like the psudo code below?
(One complex point would be to get the node id of default cma area,
as 'struct cma' is a private definition in mm/cma.h, and we can't
access cma->nid inside contiguous.c for now)

---
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index d8fd6f779f79..6694ae62e785 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -97,6 +97,7 @@ static struct cma *dma_contiguous_numa_area[MAX_NUMNODES];
static phys_addr_t numa_cma_size[MAX_NUMNODES] __initdata;
static struct cma *dma_contiguous_pernuma_area[MAX_NUMNODES];
static phys_addr_t pernuma_size_bytes __initdata;
+static bool numa_cma_configured;

static int __init early_numa_cma(char *p)
{
@@ -125,6 +126,7 @@ static int __init early_numa_cma(char *p)
break;
}

+ numa_cma_configured = true;
return 0;
}
early_param("numa_cma", early_numa_cma);
@@ -132,6 +134,7 @@ early_param("numa_cma", early_numa_cma);
static int __init early_cma_pernuma(char *p)
{
pernuma_size_bytes = memparse(p, &p);
+ numa_cma_configured = true;
return 0;
}
early_param("cma_pernuma", early_cma_pernuma);
@@ -182,6 +185,11 @@ static void __init dma_numa_cma_reserve(void)
ret, nid);
}

+ if (!numa_cma_configured && dma_contiguous_default_area) {
+ if (nid != dma_contiguous_default_area->nid)
+ numa_cma_size[nid] = (dma_contiguous_default_area->count) << PAGE_SHIFT;
+ }
+
if (numa_cma_size[nid]) {

cma = &dma_contiguous_numa_area[nid];
@@ -216,8 +224,6 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
phys_addr_t selected_limit = limit;
bool fixed = false;

- dma_numa_cma_reserve();
-
pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);

if (size_cmdline != -1) {
@@ -256,6 +262,8 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
if (ret)
pr_warn("Couldn't register default CMA heap.");
}
+
+ dma_numa_cma_reserve();
}

void __weak


Thanks,
Feng

>
> Thanks,
> Robin.