Re: [RFC PATCH] iommu: Default to lazy DMA mode on ARM64
From: Robin Murphy
Date: Tue Apr 07 2026 - 07:25:56 EST
On 02/04/2026 8:59 pm, Nafees Ahmed Abdul wrote:
ARM64 currently falls through to IOMMU_DEFAULT_DMA_STRICT, while
X86 defaults to IOMMU_DEFAULT_DMA_LAZY. On ARM64 bare-metal
systems with the ARM SMMU, strict mode causes synchronous TLBI
+ CMD_SYNC on every DMA unmap, resulting in significant
throughput degradation for network-intensive workloads.
Benchmarked on an ARM64 bare-metal system (AWS m8g.metal-24xl)
running Debian 13 with kernel 6.12.74, using iperf3:
STRICT (default): 14.9 Gbps
LAZY: 39.8 Gbps
This is a 2.67x throughput improvement simply by switching the
IOMMU default domain mode.
Distributions that do not explicitly override this Kconfig
choice (e.g., Debian, SLES) silently get STRICT on ARM64,
causing this regression on bare-metal systems.
It is not a "regression", it has always been this way since the beginning of IOMMU support on arm64. For many years, we didn't even have such a thing as lazy mode.
Changing the
upstream default avoids the need for each distribution to
independently carry this override.
...while equally *creating* that need for all the distros/users who do value security/robustness above performance. Who's to say what matters most? Besides, defconfig is never meant to be a distro config; distros *should* maintain their own configs, and if they're not delivering the options that the majority of their users want, that's between the distros and their users.
The numbers game goes both ways too - the sheer quantity of arm64 systems where strict vs. lazy makes no noticeable performance difference, but does offer that small robustness benefit (i.e. embedded/mobile) is many orders of magnitude more the number of arm64 systems capable of 50GbE. Even your own data are suggesting this is actually a pretty niche case, if even 10GbE systems would still have plenty of headroom to keep up in strict mode - if anything that's actually pretty impressive!
Add ARM64 to the LAZY default to align with X86 behavior.
But the other side of that is that the x86 (and S390) behaviour is a 20-year-old legacy which arguably only looks more and more anachronistic in today's post-Spectre/etc. security-conscious world. Wouldn't an even better alignment argument be to start cleaning up such legacy, rather than spread it further onto more modern architectures which never even had it?
Thanks,
Robin.
Signed-off-by: Nafees Ahmed Abdul <nafeabd@xxxxxxxxxx>
---
drivers/iommu/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f86262b11..2822aba75 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -96,7 +96,7 @@ config IOMMU_DEBUGFS
choice
prompt "IOMMU default domain type"
depends on IOMMU_API
- default IOMMU_DEFAULT_DMA_LAZY if X86 || S390
+ default IOMMU_DEFAULT_DMA_LAZY if X86 || S390 || ARM64
default IOMMU_DEFAULT_DMA_STRICT
help
Choose the type of IOMMU domain used to manage DMA API usage by