Re: [PATCH v3 0/5] dma-mapping: arm64: support batched cache sync
From: Catalin Marinas
Date: Fri Mar 13 2026 - 15:37:31 EST
On Tue, Mar 03, 2026 at 05:33:37PM +0100, Marek Szyprowski wrote:
> On 28.02.2026 23:11, Barry Song wrote:
> > From: Barry Song <baohua@xxxxxxxxxx>
> >
> > Many embedded ARM64 SoCs still lack hardware cache coherency support, which
> > causes DMA mapping operations to appear as hotspots in on-CPU flame graphs.
> >
> > For an SG list with *nents* entries, the current dma_map/unmap_sg() and DMA
> > sync APIs perform cache maintenance one entry at a time. After each entry,
> > the implementation synchronously waits for the corresponding region’s
> > D-cache operations to complete. On architectures like arm64, efficiency can
> > be improved by issuing all entries’ operations first and then performing a
> > single batched wait for completion.
> >
> > Tangquan's results show that batched synchronization can reduce
> > dma_map_sg() time by 64.61% and dma_unmap_sg() time by 66.60% on an MTK
> > phone platform (MediaTek Dimensity 9500). The tests were performed by
> > pinning the task to CPU7 and fixing the CPU frequency at 2.6 GHz,
> > running dma_map_sg() and dma_unmap_sg() on 10 MB buffers (10 MB / 4 KB
> > sg entries per buffer) for 200 iterations and then averaging the
> > results.
> >
> > Thanks to Xueyuan for volunteering to take on the testing tasks. He
> > put significant effort into validating paths such as IOVA link/unlink
> > and SWIOTLB on RK3588 boards with NVMe.
>
> Catalin, Will, I would like to merge this to dma-mapping tree, give Your
> ack or comment if You are okay with ARM64 related parts.
Sorry for the delay. Yes, feel free to pick them up. I doubt there would
be any conflicts in this area with what I'm merging through the arm64
tree.
Thanks.
--
Catalin