Re: [PATCH] dma-buf: system_heap: Optimize sg_table-to-pages conversion in vmap

From: T.J. Mercier

Date: Fri May 01 2026 - 11:59:47 EST

On Thu, Apr 30, 2026 at 9:15 PM Barry Song <baohua@xxxxxxxxxx> wrote:
>
> On Wed, Apr 22, 2026 at 3:10 PM Christian König
> <christian.koenig@xxxxxxx> wrote:
> >
> > On 4/7/26 13:29, Barry Song wrote:
> > > On Tue, Apr 7, 2026 at 3:58 PM Christian König <christian.koenig@xxxxxxx> wrote:
> > >>
> > >> On 4/6/26 23:49, Barry Song (Xiaomi) wrote:
> > >>> From: Xueyuan Chen <Xueyuan.chen21@xxxxxxxxx>
> > >>>
> > >>> Replace the heavy for_each_sgtable_page() iterator in system_heap_do_vmap()
> > >>> with a more efficient nested loop approach.
> > >>>
> > >>> Instead of iterating page by page, we now iterate through the scatterlist
> > >>> entries via for_each_sgtable_sg(). Because pages within a single sg entry
> > >>> are physically contiguous, we can populate the page array with a in an
> > >>> inner loop using simple pointer math. This save a lot of time.
> > >>>
> > >>> The WARN_ON check is also pulled out of the loop to save branch
> > >>> instructions.
> > >>>
> > >>> Performance results mapping a 2GB buffer on Radxa O6:
> > >>> - Before: ~1440000 ns
> > >>> - After: ~232000 ns
> > >>> (~84% reduction in iteration time, or ~6.2x faster)
> > >>
> > >> Well real question is why do you care about the vmap performance?
> > >>
> > >> That should basically only be used for fbdev emulation (except for VMGFX) and we absolutely don't care about performance there.
> > >
> > > I agree that in mainline, dma_buf_vmap is not used very often.
> > > Here’s what I was able to find:
> > >
> > > 1 1638 drivers/dma-buf/dma-buf.c <<dma_buf_vmap_unlocked>>
> > > ret = dma_buf_vmap(dmabuf, map);
> > > 2 376 drivers/gpu/drm/drm_gem_shmem_helper.c
> > > <<drm_gem_shmem_vmap_locked>>
> > > ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
> > > 3 85 drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> > > <<etnaviv_gem_prime_vmap_impl>>
> > > ret = dma_buf_vmap(etnaviv_obj->base.import_attach->dmabuf, &map);
> > > 4 433 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c <<map_external>>
> > > ret = dma_buf_vmap(bo->tbo.base.dma_buf, map);
> > > 5 88 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c <<vmw_gem_vmap>>
> > > ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
> > >
> > > However, in the Android ecosystem, system_heap and similar heaps
> > > are widely used across camera, NPU, and media drivers. Many of these
> > > drivers are not in mainline but do use vmap() in real code paths.
> >
> > Well out of tree drivers are not a justification to make an upstream changes.
> >
> > Apart from a handful of workarounds which need to CPU access as fallback DMA-buf vmap is only used to provide fb dev emulation.
> >
> > The vmap interface has already given us quite a headache in the first place and there are a couple of unresolved problems regarding synchronization and coherency.
> >
> > When a driver would be pushed upstream which makes so frequent use of the dma_buf_vmap function that it matters for the performance I think there would be push back on that and the driver developer would require a very good explanation why that is necessary.
> >
> > So for now I have to reject that patch.
>
> Well, it doesn’t seem to increase complexity, and the code is quite easy
> to understand.

I agree with this. This change introduces basically no downsides for
upstream, even if it primarily benefits a rare use case. Since
dma_buf_vmap is exported for driver use, why not enhance the
performance for all callers?

-T.J.

> It would be great if the community could be more welcoming
> to developers who are just getting involved, rather than discouraging them.
>
> Apparently, no one can control whether the source code of those kernel
> modules will be upstreamed except the vendors themselves, but products
> can still benefit from the common kernel.
>
> Best Regards
> Barry