Re: [PATCH v3 5/6] dma-buf: heaps: Add Coherent heap to dmabuf heaps

From: Marek Szyprowski

Date: Thu Apr 02 2026 - 02:57:57 EST

On 16.03.2026 13:08, Maxime Ripard wrote:
> On Wed, Mar 11, 2026 at 08:18:28AM -0500, Andrew Davis wrote:
>> On 3/11/26 5:19 AM, Albert Esteve wrote:
>>> On Tue, Mar 10, 2026 at 4:34 PM Andrew Davis <afd@xxxxxx> wrote:
>>>> On 3/6/26 4:36 AM, Albert Esteve wrote:
>>>>> Expose DT coherent reserved-memory pools ("shared-dma-pool"
>>>>> without "reusable") as dma-buf heaps, creating one heap per
>>>>> region so userspace can allocate from the exact device-local
>>>>> pool intended for coherent DMA.
>>>>>
>>>>> This is a missing backend in the long-term effort to steer
>>>>> userspace buffer allocations (DRM, v4l2, dma-buf heaps)
>>>>> through heaps for clearer cgroup accounting. CMA and system
>>>>> heaps already exist; non-reusable coherent reserved memory
>>>>> did not.
>>>>>
>>>>> The heap binds the heap device to each memory region so
>>>>> coherent allocations use the correct dev->dma_mem, and
>>>>> it defers registration until module_init when normal
>>>>> allocators are available.
>>>>>
>>>>> Signed-off-by: Albert Esteve <aesteve@xxxxxxxxxx>
>>>>> ---
>>>>> drivers/dma-buf/heaps/Kconfig | 9 +
>>>>> drivers/dma-buf/heaps/Makefile | 1 +
>>>>> drivers/dma-buf/heaps/coherent_heap.c | 414 ++++++++++++++++++++++++++++++++++
>>>>> 3 files changed, 424 insertions(+)
>>>>>
>>>>> (...)
>>>> You are doing this DMA allocation using a non-DMA pseudo-device (heap_dev).
>>>> This is why you need to do that dma_coerce_mask_and_coherent(64) nonsense, you
>>>> are doing a DMA alloc for the CPU itself. This might still work, but only if
>>>> dma_map_sgtable() can handle swiotlb/iommu for all attaching devices at map
>>>> time.
>>> The concern is valid. We're allocating via a synthetic device, which
>>> ties the allocation to that device's DMA domain. I looked deeper into
>>> this trying to address the concern.
>>>
>>> The approach works because dma_map_sgtable() handles both
>>> dma_map_direct and use_dma_iommu cases in __dma_map_sg_attrs(). For
>>> each physical address in the sg_table (extracted via sg_phys()), it
>>> creates device-specific DMA mappings:
>>> - For direct mapping: it checks if the address is directly accessible
>>> (dma_capable()), and if not, it falls back to swiotlb.
>>> - For IOMMU: it creates mappings that allow the device to access
>>> physical addresses.
>>>
>>> This means every attached device gets its own device-specific DMA
>>> mapping, properly handling cases where the physical addresses are
>>> inaccessible or have DMA constraints.
>>>
>> While this means it might still "work" it won't always be ideal. Take
>> the case where the consuming device(s) have a 32bit address restriction,
>> if the allocation was done using the real devices then the backing buffer
>> itself would be allocated in <32bit mem. Whereas here the allocation
>> could end up in >32bit mem, as the CPU/synthetic device supports that.
>> Then each mapping device would instead get a bounce buffer.
>>
>> (this example might not be great as we usually know the address of
>> carveout/reserved memory regions, but substitute in whatever restriction
>> makes more sense)
>>
>> These non-reusable carveouts tend to be made for some specific device, and
>> they are made specifically because that device has some memory restriction.
>> So we might run into the situation above more than one would expect.
>>
>> Not a blocker here, but just something worth thinking on.
> As I detailed in the previous version [1] the main idea behind that work
> is to allow to get rid of dma_alloc_attrs for framework and drivers to
> allocate from the heaps instead.
>
> Robin was saying he wasn't comfortable with exposing this heap to
> userspace, and we're saying here that maybe this might not always work
> anyway (or at least that we couldn't test it fully).
>
> Maybe the best thing is to defer this series until we are at a point
> where we can start enabling the "heap allocations" in frameworks then?
> Hopefully we will have hardware to test it with by then, and we might
> not even need to expose it to userspace at all but only to the kernel.
>
> What do you think?

IMHO a good idea. Maybe in-kernel heap for the coherent allocations will
be just enough.

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland