Re: [PATCH] dma-buf: system_heap: Optimize sg_table-to-pages conversion in vmap
From: Christian König
Date: Wed Apr 22 2026 - 03:13:04 EST
On 4/7/26 13:29, Barry Song wrote:
> On Tue, Apr 7, 2026 at 3:58 PM Christian König <christian.koenig@xxxxxxx> wrote:
>>
>> On 4/6/26 23:49, Barry Song (Xiaomi) wrote:
>>> From: Xueyuan Chen <Xueyuan.chen21@xxxxxxxxx>
>>>
>>> Replace the heavy for_each_sgtable_page() iterator in system_heap_do_vmap()
>>> with a more efficient nested loop approach.
>>>
>>> Instead of iterating page by page, we now iterate through the scatterlist
>>> entries via for_each_sgtable_sg(). Because pages within a single sg entry
>>> are physically contiguous, we can populate the page array with a in an
>>> inner loop using simple pointer math. This save a lot of time.
>>>
>>> The WARN_ON check is also pulled out of the loop to save branch
>>> instructions.
>>>
>>> Performance results mapping a 2GB buffer on Radxa O6:
>>> - Before: ~1440000 ns
>>> - After: ~232000 ns
>>> (~84% reduction in iteration time, or ~6.2x faster)
>>
>> Well real question is why do you care about the vmap performance?
>>
>> That should basically only be used for fbdev emulation (except for VMGFX) and we absolutely don't care about performance there.
>
> I agree that in mainline, dma_buf_vmap is not used very often.
> Here’s what I was able to find:
>
> 1 1638 drivers/dma-buf/dma-buf.c <<dma_buf_vmap_unlocked>>
> ret = dma_buf_vmap(dmabuf, map);
> 2 376 drivers/gpu/drm/drm_gem_shmem_helper.c
> <<drm_gem_shmem_vmap_locked>>
> ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
> 3 85 drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> <<etnaviv_gem_prime_vmap_impl>>
> ret = dma_buf_vmap(etnaviv_obj->base.import_attach->dmabuf, &map);
> 4 433 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c <<map_external>>
> ret = dma_buf_vmap(bo->tbo.base.dma_buf, map);
> 5 88 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c <<vmw_gem_vmap>>
> ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
>
> However, in the Android ecosystem, system_heap and similar heaps
> are widely used across camera, NPU, and media drivers. Many of these
> drivers are not in mainline but do use vmap() in real code paths.
Well out of tree drivers are not a justification to make an upstream changes.
Apart from a handful of workarounds which need to CPU access as fallback DMA-buf vmap is only used to provide fb dev emulation.
The vmap interface has already given us quite a headache in the first place and there are a couple of unresolved problems regarding synchronization and coherency.
When a driver would be pushed upstream which makes so frequent use of the dma_buf_vmap function that it matters for the performance I think there would be push back on that and the driver developer would require a very good explanation why that is necessary.
So for now I have to reject that patch.
Regards,
Christian.
>
> As I can show you some of them from MTK platforms:
>
> 1:
> [ 6.689849] system_heap_vmap+0x17c/0x254 [system_heap
> 8d35d4ce35bb30d8a623f0b9863998a2528e4175]
> [ 6.689859] dma_buf_vmap_unlocked+0xb8/0x130
> [ 6.689861] aov_core_init+0x310/0x718 [mtk_aov
> 96e2e5e9457dcdacce3a7629b0600c5dbeca623b]
> [ 6.689873] mtk_aov_probe+0x434/0x5b4 [mtk_aov
> 96e2e5e9457dcdacce3a7629b0600c5dbeca623b]
>
> 2:
> [ 116.181643] __vmap_pages_range_noflush+0x7c4/0x814
> [ 116.181645] vmap+0xb4/0x148
> [ 116.181647] system_heap_vmap+0x17c/0x254 [system_heap
> 8d35d4ce35bb30d8a623f0b9863998a2528e4175]
> [ 116.181651] dma_buf_vmap_unlocked+0xb8/0x130
> [ 116.181653] mtk_cam_vb2_vaddr+0xa0/0xfc [mtk_cam_isp8s
> 0cf9be6c773a8f14aab9db9ebf53feacb499846a]
> [ 116.181682] vb2_plane_vaddr+0x5c/0x78
> [ 116.181684] mtk_cam_job_fill_ipi_frame+0xa8c/0x128c [mtk_cam_isp8s
> 0cf9be6c773a8f14aab9db9ebf53feacb499846a]
>
> 3:
> [ 116.306178] __vmap_pages_range_noflush+0x7c4/0x814
> [ 116.306183] vmap+0xb4/0x148
> [ 116.306187] system_heap_vmap+0x17c/0x254 [system_heap
> 8d35d4ce35bb30d8a623f0b9863998a2528e4175]
> [ 116.306209] dma_buf_vmap_unlocked+0xb8/0x130
> [ 116.306212] apu_sysmem_alloc+0x168/0x360 [apusys
> 8fb33cbce3b858d651b9da26fc370090a67cfb70]
> [ 116.306468] mdw_mem_alloc+0xd8/0x314 [apusys
> 8fb33cbce3b858d651b9da26fc370090a67cfb70]
> [ 116.306591] mdw_mem_pool_chunk_add+0x11c/0x400 [apusys
> 8fb33cbce3b858d651b9da26fc370090a67cfb70]
> [ 116.306712] mdw_mem_pool_create+0x190/0x2c8 [apusys
> 8fb33cbce3b858d651b9da26fc370090a67cfb70]
> [ 116.306833] mdw_drv_open+0x21c/0x47c [apusys
> 8fb33cbce3b858d651b9da26fc370090a67cfb70]
>
> While we may want to encourage more of these drivers to upstream,
> some aspects are beyond our control (different SoC vendors), but we
> can at least contribute upstream ourselves.
>
> Best Regards
> Barry