[PATCH v3 0/5] dma-mapping: arm64: support batched cache sync
From: Barry Song
Date: Sat Feb 28 2026 - 17:11:51 EST
From: Barry Song <baohua@xxxxxxxxxx>
Many embedded ARM64 SoCs still lack hardware cache coherency support, which
causes DMA mapping operations to appear as hotspots in on-CPU flame graphs.
For an SG list with *nents* entries, the current dma_map/unmap_sg() and DMA
sync APIs perform cache maintenance one entry at a time. After each entry,
the implementation synchronously waits for the corresponding region’s
D-cache operations to complete. On architectures like arm64, efficiency can
be improved by issuing all entries’ operations first and then performing a
single batched wait for completion.
Tangquan's results show that batched synchronization can reduce
dma_map_sg() time by 64.61% and dma_unmap_sg() time by 66.60% on an MTK
phone platform (MediaTek Dimensity 9500). The tests were performed by
pinning the task to CPU7 and fixing the CPU frequency at 2.6 GHz,
running dma_map_sg() and dma_unmap_sg() on 10 MB buffers (10 MB / 4 KB
sg entries per buffer) for 200 iterations and then averaging the
results.
Thanks to Xueyuan for volunteering to take on the testing tasks. He
put significant effort into validating paths such as IOVA link/unlink
and SWIOTLB on RK3588 boards with NVMe.
v3:
* Fold patches 5/8, 7/8, and 8/8 into patch 4/8 as suggested by Leon,
reducing the series from 8 patches to 5;
* Fix the SWIOTLB path by ensuring a sync is issued before memcpy;
* Add ARCH_HAS_BATCHED_DMA_SYNC Kconfig as suggested by Leon;
* Collect Reviewed-by tags from Leon and Juergen. Leon's tag is not
added to patch 4 since it has changed significantly since v2 and
requires re-review;
* Rename some asm macros and functions as suggested by Will;
* Add Xueyuan's Tested-by. His help is greatly appreciated!
v2 link:
https://lore.kernel.org/lkml/20251226225254.46197-1-21cnbao@xxxxxxxxx/
v2:
* Refine a large amount of arm64 asm code based on feedback from
Robin, thanks!
* Drop batch_add APIs and always use arch_sync_dma_for_* + flush,
even for a single buffer, based on Leon’s suggestion, thanks!
* Refine a large amount of code based on feedback from Leon, thanks!
* Also add batch support for iommu_dma_sync_sg_for_{cpu,device}
v1 link:
https://lore.kernel.org/lkml/20251219053658.84978-1-21cnbao@xxxxxxxxx/
v1, diff with RFC:
* Drop a large number of #ifdef/#else/#endif blocks based on feedback
from Catalin and Marek, thanks!
* Also add batched iova link/unlink support, marked as RFC since I lack
the required hardware. This was suggested by Marek, thanks!
RFC link:
https://lore.kernel.org/lkml/20251029023115.22809-1-21cnbao@xxxxxxxxx/
Barry Song (5):
arm64: Provide dcache_by_myline_op_nosync helper
arm64: Provide dcache_clean_poc_nosync helper
arm64: Provide dcache_inval_poc_nosync helper
dma-mapping: Separate DMA sync issuing and completion waiting
dma-mapping: Support batch mode for dma_direct_{map,unmap}_sg
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/assembler.h | 25 ++++++++++---
arch/arm64/include/asm/cache.h | 5 +++
arch/arm64/include/asm/cacheflush.h | 2 +
arch/arm64/kernel/relocate_kernel.S | 3 +-
arch/arm64/mm/cache.S | 57 +++++++++++++++++++++++------
arch/arm64/mm/dma-mapping.c | 4 +-
drivers/iommu/dma-iommu.c | 35 ++++++++++++++----
drivers/xen/swiotlb-xen.c | 24 ++++++++----
include/linux/dma-map-ops.h | 6 +++
kernel/dma/Kconfig | 3 ++
kernel/dma/direct.c | 23 +++++++++---
kernel/dma/direct.h | 21 ++++++++---
kernel/dma/mapping.c | 6 +--
kernel/dma/swiotlb.c | 7 +++-
15 files changed, 171 insertions(+), 51 deletions(-)
Cc: Leon Romanovsky <leon@xxxxxxxxxx>
Cc: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
Cc: Will Deacon <will@xxxxxxxxxx>
Cc: Ada Couprie Diaz <ada.coupriediaz@xxxxxxx>
Cc: Ard Biesheuvel <ardb@xxxxxxxxxx>
Cc: Marc Zyngier <maz@xxxxxxxxxx>
Cc: Anshuman Khandual <anshuman.khandual@xxxxxxx>
Cc: Ryan Roberts <ryan.roberts@xxxxxxx>
Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx>
Cc: Robin Murphy <robin.murphy@xxxxxxx>
Cc: Joerg Roedel <joro@xxxxxxxxxx>
Cc: Juergen Gross <jgross@xxxxxxxx>
Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>
Cc: Tangquan Zheng <zhengtangquan@xxxxxxxx>
Cc: Huacai Zhou <zhouhuacai@xxxxxxxx>
Cc: Xueyuan Chen <xueyuan.chen21@xxxxxxxxx>
--
2.39.3 (Apple Git-146)