Re: [PATCH] dma-buf: heaps: Add Coherent heap to dmabuf heaps

From: Albert Esteve

Date: Fri Feb 27 2026 - 07:49:42 EST


Hi Maxime!

On Thu, Feb 26, 2026 at 11:12 AM Maxime Ripard <mripard@xxxxxxxxxx> wrote:
>
> Hi Albert,
>
> Thanks for your patch!
>
> On Tue, Feb 24, 2026 at 08:57:33AM +0100, Albert Esteve wrote:
> > Add a dma-buf heap for DT coherent reserved-memory
> > (i.e., 'shared-dma-pool' without 'reusable' property),
> > exposing one heap per region for userspace buffers.
> >
> > The heap binds a synthetic platform device to each region
> > so coherent allocations use the correct dev->dma_mem,
> > and it defers registration until late_initcall when
> > normal allocator are available.
> >
> > This patch includes charging of the coherent heap
> > allocator to the dmem cgroup.
> >
> > Signed-off-by: Albert Esteve <aesteve@xxxxxxxxxx>
> > ---
> > This patch introduces a new driver to expose DT coherent reserved-memory
> > regions as dma-buf heaps, allowing userspace buffers to be created.
> >
> > Since these regions are device-dependent, we bind a synthetic platform
> > device to each region so coherent allocations use the correct dev->dma_mem.
> >
> > Following Eric’s [1] and Maxime’s [2] work on charging DMA buffers
> > allocated from userspace to cgroups (dmem), this patch adds the same
> > charging pattern used by CMA heaps patch. Charging is done only through
> > the dma-buf heap interface so it can be attributed to a userspace allocator.
> >
> > This allows each device-specific reserved-memory region to enforce its
> > own limits.
> >
> > [1] https://lore.kernel.org/all/20260218-dmabuf-heap-cma-dmem-v2-0-b249886fb7b2@xxxxxxxxxx/
> > [2] https://lore.kernel.org/all/20250310-dmem-cgroups-v1-0-2984c1bc9312@xxxxxxxxxx/
> > ---
> > drivers/dma-buf/heaps/Kconfig | 17 ++
> > drivers/dma-buf/heaps/Makefile | 1 +
> > drivers/dma-buf/heaps/coherent_heap.c | 485 ++++++++++++++++++++++++++++++++++
> > include/linux/dma-heap.h | 11 +
> > kernel/dma/coherent.c | 9 +
> > 5 files changed, 523 insertions(+)
> >
> > diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig
> > index a5eef06c42264..93765dca164e3 100644
> > --- a/drivers/dma-buf/heaps/Kconfig
> > +++ b/drivers/dma-buf/heaps/Kconfig
> > @@ -12,3 +12,20 @@ config DMABUF_HEAPS_CMA
> > Choose this option to enable dma-buf CMA heap. This heap is backed
> > by the Contiguous Memory Allocator (CMA). If your system has these
> > regions, you should say Y here.
> > +
> > +config DMABUF_HEAPS_COHERENT
> > + bool "DMA-BUF Coherent Reserved-Memory Heap"
> > + depends on DMABUF_HEAPS && OF_RESERVED_MEM && DMA_DECLARE_COHERENT
> > + help
> > + Choose this option to enable coherent reserved-memory dma-buf heaps.
> > + This heap is backed by non-reusable DT "shared-dma-pool" regions.
> > + If your system defines coherent reserved-memory regions, you should
> > + say Y here.
> > +
> > +config COHERENT_AREAS_DEFERRED
> > + int "Max deferred coherent reserved-memory regions"
> > + depends on DMABUF_HEAPS_COHERENT
> > + default 16
> > + help
> > + Maximum number of coherent reserved-memory regions that can be
> > + deferred for later registration during early boot.
> > diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile
> > index 974467791032f..96bda7a65f041 100644
> > --- a/drivers/dma-buf/heaps/Makefile
> > +++ b/drivers/dma-buf/heaps/Makefile
> > @@ -1,3 +1,4 @@
> > # SPDX-License-Identifier: GPL-2.0
> > obj-$(CONFIG_DMABUF_HEAPS_SYSTEM) += system_heap.o
> > obj-$(CONFIG_DMABUF_HEAPS_CMA) += cma_heap.o
> > +obj-$(CONFIG_DMABUF_HEAPS_COHERENT) += coherent_heap.o
> > diff --git a/drivers/dma-buf/heaps/coherent_heap.c b/drivers/dma-buf/heaps/coherent_heap.c
> > new file mode 100644
> > index 0000000000000..870b2b89aefcb
> > --- /dev/null
> > +++ b/drivers/dma-buf/heaps/coherent_heap.c
> > @@ -0,0 +1,485 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * DMABUF heap for coherent reserved-memory regions
> > + *
> > + * Copyright (C) 2026 Red Hat, Inc.
> > + * Author: Albert Esteve <aesteve@xxxxxxxxxx>
> > + *
> > + */
> > +
> > +#include <linux/cgroup_dmem.h>
> > +#include <linux/dma-heap.h>
> > +#include <linux/dma-buf.h>
> > +#include <linux/dma-mapping.h>
> > +#include <linux/err.h>
> > +#include <linux/highmem.h>
> > +#include <linux/iosys-map.h>
> > +#include <linux/of_reserved_mem.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/scatterlist.h>
> > +#include <linux/slab.h>
> > +#include <linux/vmalloc.h>
> > +
> > +#define DEFERRED_AREAS_MAX CONFIG_COHERENT_AREAS_DEFERRED
>
> I'm not sure we need to make it configurable. Distros are going to set
> it to the user with the highest number of regions anyway. How about
> using MAX_RESERVED_REGIONS for now?

Makes sense, will do.

>
> >
> > [...]
> >
> > +struct coherent_heap {
> > + struct dma_heap *heap;
> > + struct reserved_mem *rmem;
> > + char *name;
> > + struct device *dev;
> > + struct platform_device *pdev;
> > +#if IS_ENABLED(CONFIG_CGROUP_DMEM)
> > + struct dmem_cgroup_region *cg;
> > +#endif
>
> We might want to leave the dmem accounting out for now so we can focus
> on the heap itself. That being said, it ended up being pretty trivial
> for CMA, so maybe it's not too much of a concern.

Sure. That allows us to follow the same patterns once the CMA series lands.
I will strip all dmem accounting parts for v2.

>
> >
> > [...]
> >
> > +static int __coherent_heap_register(struct reserved_mem *rmem)
> > +{
> > + struct dma_heap_export_info exp_info;
> > + struct coherent_heap *coh_heap;
> > +#if IS_ENABLED(CONFIG_CGROUP_DMEM)
> > + struct dmem_cgroup_region *region;
> > +#endif
> > + const char *rmem_name;
> > + int ret;
> > +
> > + if (!rmem)
> > + return -EINVAL;
> > +
> > + rmem_name = rmem->name ? rmem->name : "unknown";
>
> If the reserved region has no name, we probably shouldn't expose it to
> userspace at all. Using unknown will probably create some bugs if you
> have several, but also it's pretty like to have a name at some point and
> thus we wouldn't have a stable name for the uAPI.

Agreed. I will return an error code if no name is present.

>
> > + coh_heap = kzalloc_obj(*coh_heap);
> > + if (!coh_heap)
> > + return -ENOMEM;
> > +
> > + coh_heap->name = kasprintf(GFP_KERNEL, "coherent_%s", rmem_name);
> > + if (!coh_heap->name) {
> > + ret = -ENOMEM;
> > + goto free_coherent_heap;
> > + }
>
> Similarly, we shouldn't use the coherent_ prefix for the heap name. If
> the backing allocator ever changes (and between contiguous and coherent,
> the difference is just a single property value in the DT), then the name
> would change and userspace would break. We should directly use the name
> of the region here.
>
> > + coh_heap->rmem = rmem;
> > +
> > + /* create a platform device per rmem and bind it */
> > + coh_heap->pdev = platform_device_register_simple("coherent-heap",
> > + PLATFORM_DEVID_AUTO,
> > + NULL, 0);
> > + if (IS_ERR(coh_heap->pdev)) {
> > + ret = PTR_ERR(coh_heap->pdev);
> > + goto free_name;
> > + }
>
> We probably should use a faux_device here instead of a platform_device,
> but more importantly, the heap itself already has a device allocated for
> it (dev_ret in dma_heap_add).
>
> It's not in struct dma_heap yet, but there's a patch that moves it there
> that we should probably carry:
> https://lore.kernel.org/r/20210120210937.15069-2-john.stultz@xxxxxxxxxx/

Thanks for sharing the link! I will pick the patch.

>
> > + if (rmem->ops && rmem->ops->device_init) {
> > + ret = rmem->ops->device_init(rmem, &coh_heap->pdev->dev);
> > + if (ret)
> > + goto pdev_unregister;
> > + }
>
> I'm not really a fan of calling ops directly. Maybe we should create an
> of_reserved_mem_device_init_with_mem function that would do it for us
> (and would be called by of_reserved_mem_device_init_by_idx and the
> likes).

Agreed.

>
> > + coh_heap->dev = &coh_heap->pdev->dev;
> > +#if IS_ENABLED(CONFIG_CGROUP_DMEM)
> > + region = dmem_cgroup_register_region(rmem->size, "coh/%s", rmem_name);
> > + if (IS_ERR(region)) {
> > + ret = PTR_ERR(region);
> > + goto pdev_unregister;
> > + }
> > + coh_heap->cg = region;
> > +#endif
>
> Same comment than for CMA here: it should really be created by the
> coherent memory region itself.
>
> > + exp_info.name = coh_heap->name;
> > + exp_info.ops = &coherent_heap_ops;
> > + exp_info.priv = coh_heap;
> > +
> > + coh_heap->heap = dma_heap_add(&exp_info);
> > + if (IS_ERR(coh_heap->heap)) {
> > + ret = PTR_ERR(coh_heap->heap);
> > + goto cg_unregister;
> > + }
> > +
> > + return 0;
> > +
> > +cg_unregister:
> > +#if IS_ENABLED(CONFIG_CGROUP_DMEM)
> > + dmem_cgroup_unregister_region(coh_heap->cg);
> > +#endif
> > +pdev_unregister:
> > + platform_device_unregister(coh_heap->pdev);
> > + coh_heap->pdev = NULL;
> > +free_name:
> > + kfree(coh_heap->name);
> > +free_coherent_heap:
> > + kfree(coh_heap);
> > +
> > + return ret;
> > +}
> > +
> > +int dma_heap_coherent_register(struct reserved_mem *rmem)
> > +{
> > + int ret;
> > +
> > + ret = __coherent_heap_register(rmem);
> > + if (ret == -ENOMEM)
> > + return coherent_heap_add_deferred(rmem);
> > + return ret;
> > +}
>
> I appreciate you did it like we did for CMA, but if we ever want to make
> that heap a module you'll end up with a dependency from the core kernel
> on a module which doesn't work. The best here might be to wait a bit
> until we have somewhat of an agreement on
>
> https://lore.kernel.org/r/20260225-dma-buf-heaps-as-modules-v1-0-2109225a090d@xxxxxxxxxx
>
> > +static int __init coherent_heap_register_deferred(void)
> > +{
> > + unsigned int i;
> > + int ret;
> > +
> > + for (i = 0; i < rmem_areas_deferred_num; i++) {
> > + struct reserved_mem *rmem = rmem_areas_deferred[i];
> > +
> > + ret = __coherent_heap_register(rmem);
> > + if (ret) {
> > + pr_warn("Failed to add coherent heap %s",
> > + rmem->name ? rmem->name : "unknown");
> > + continue;
> > + }
> > + }
> > +
> > + return 0;
> > +}
> > +late_initcall(coherent_heap_register_deferred);
>
> Why do you need a late_initcall here? Isn't module_init enough?

When I tested this initially, I relied on direct invocations from
cma/coherent.c to register new coherent heap areas. However it failed
in `kzalloc_obj` calls within the register function. Then I read the
article about boot time memory management[1] and saw other drivers
collected info for deferred initialization at late_initcall(), similar
to what I tried to do here. I honestly did not try with module_init().
Since I will refactor this part to follow your previous comments, I
will try to update if possible.

[1] https://docs.kernel.org/core-api/boot-time-mm.html

>
> > +MODULE_DESCRIPTION("DMA-BUF heap for coherent reserved-memory regions");
> > diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h
> > index 648328a64b27e..e894cfa1ecf1a 100644
> > --- a/include/linux/dma-heap.h
> > +++ b/include/linux/dma-heap.h
> > @@ -9,9 +9,11 @@
> > #ifndef _DMA_HEAPS_H
> > #define _DMA_HEAPS_H
> >
> > +#include <linux/errno.h>
> > #include <linux/types.h>
> >
> > struct dma_heap;
> > +struct reserved_mem;
> >
> > /**
> > * struct dma_heap_ops - ops to operate on a given heap
> > @@ -48,4 +50,13 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info);
> >
> > extern bool mem_accounting;
> >
> > +#if IS_ENABLED(CONFIG_DMABUF_HEAPS_COHERENT)
> > +int dma_heap_coherent_register(struct reserved_mem *rmem);
> > +#else
> > +static inline int dma_heap_coherent_register(struct reserved_mem *rmem)
> > +{
> > + return -EOPNOTSUPP;
> > +}
> > +#endif
> > +
> > #endif /* _DMA_HEAPS_H */
> > diff --git a/kernel/dma/coherent.c b/kernel/dma/coherent.c
> > index 1147497bc512c..f49d13e460e4b 100644
> > --- a/kernel/dma/coherent.c
> > +++ b/kernel/dma/coherent.c
> > @@ -9,6 +9,7 @@
> > #include <linux/module.h>
> > #include <linux/dma-direct.h>
> > #include <linux/dma-map-ops.h>
> > +#include <linux/dma-heap.h>
> >
> > struct dma_coherent_mem {
> > void *virt_base;
> > @@ -393,6 +394,14 @@ static int __init rmem_dma_setup(struct reserved_mem *rmem)
> > rmem->ops = &rmem_dma_ops;
> > pr_info("Reserved memory: created DMA memory pool at %pa, size %ld MiB\n",
> > &rmem->base, (unsigned long)rmem->size / SZ_1M);
> > +
> > + if (IS_ENABLED(CONFIG_DMABUF_HEAPS_COHERENT)) {
> > + int ret = dma_heap_coherent_register(rmem);
> > +
> > + if (ret)
> > + pr_warn("Reserved memory: failed to register coherent heap for %s (%d)\n",
> > + rmem->name ? rmem->name : "unknown", ret);
> > + }
>
> I think this should be split into a patch of its own. It's going to be
> reviewed (and possibly merged) by another maintainer, through another
> tree.

That's fine. I will split it into another series then. I guess I can
send it with a Based-On: tag or similar to link two series together?

BR,
Albert

>
> Maxime