Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
From: Jason Gunthorpe
Date: Thu Feb 26 2026 - 20:03:32 EST
On Thu, Feb 26, 2026 at 09:40:10PM +0100, Antheas Kapenekakis wrote:
> I am still concerned about unaligned checks. It is a functional change
> that can cause regressions in all devices. The approach of this patch
> does not affect behavior in other devices. I would like for Jason to
> weigh in.
I think Robin's solution is very clever, but I share the concern
regarding what all the implementations do.
So, I fed this question to Claude. It did find two counter points (see
below for the whole report I had it generate):
Implementations that lose the offset
s390-iommu (drivers/iommu/s390-iommu.c:989): After the 3-level ZPCI walk,
returns pte & ZPCI_PTE_ADDR_MASK with no sub-page offset added back.
iova_to_phys(0) and iova_to_phys(1) return the same page-aligned PA.
mtk_iommu_v1 (drivers/iommu/mtk_iommu_v1.c:396): Looks up the PTE by
iova >> PAGE_SHIFT (discarding offset), then returns pte & ~(page_size-1). No
step adds the sub-page offset back.
I checked myself and it seems correct. I didn't try to confirm that
the cases it says are OK are in fact OK, but it paints a convincing
picture.
I doubt S390 uses this function you are fixing, and I have no idea
about mtk. Below is also a diff how Claude thought to fix it, I didn't
try to check it.
So, I'd say if Robin is OK with these outliers then it a good and fine
approach.
Jason
iova_to_phys Implementation Survey
Entry point: iommu_iova_to_phys() in drivers/iommu/iommu.c:2502 calls
domain->ops->iova_to_phys(domain, iova) via iommu_domain_ops.
Category 1 — Delegates to io-pgtable
These drivers hold an io_pgtable_ops * and call ops->iova_to_phys(ops, iova).
The actual walk happens in one of the io-pgtable backends listed in Category
4.
----------------------------------------------------------------------------------------------------------
Driver Function (file:line) Ops assignment Notes
------------- -------------------------------------------------- ----------------------- -----------------
arm-smmu-v3 arm_smmu_iova_to_phys :3767 Pure delegation
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3471
arm-smmu arm_smmu_iova_to_phys :1655 S1 with
v1/v2 drivers/iommu/arm/arm-smmu/arm-smmu.c:1387 FEAT_TRANS_OPS
uses hw ATS1PR
registers
(CB_PAR),
otherwise
io-pgtable
apple-dart apple_dart_iova_to_phys :1021 Pure delegation →
drivers/iommu/apple-dart.c:531 io-pgtable-dart
qcom_iommu qcom_iommu_iova_to_phys :605 Delegation with
drivers/iommu/arm/arm-smmu/qcom_iommu.c:492 spinlock
ipmmu-vmsa ipmmu_iova_to_phys drivers/iommu/ipmmu-vmsa.c:702 :895 Uses
ARM_32_LPAE_S1
format
mtk_iommu mtk_iommu_iova_to_phys :1073 Delegation + 4GB
drivers/iommu/mtk_iommu.c:861 mode PA remap
fixup
----------------------------------------------------------------------------------------------------------
Category 2 — Open-coded page table walk
These drivers implement their own page table traversal without io-pgtable.
---------------------------------------------------------------------------------------------------------
Driver Function (file:line) Ops assignment Walk structure
---------------- ------------------------------------ -------------------- ------------------------------
sun50i-iommu sun50i_iommu_iova_to_phys :860 2-level (DTE → PTE)
drivers/iommu/sun50i-iommu.c:662
exynos-iommu exynos_iommu_iova_to_phys :1487 2-level (section/large/small
drivers/iommu/exynos-iommu.c:1375 page)
riscv-iommu riscv_iommu_iova_to_phys :1355 Sv39/48/57 via
drivers/iommu/riscv/iommu.c:1280 riscv_iommu_pte_fetch (:1166)
omap-iommu omap_iommu_iova_to_phys :1727 iopgtable_lookup_entry helper
drivers/iommu/omap-iommu.c:1596 (super
section/section/large/small)
rockchip-iommu rk_iommu_iova_to_phys :1190 2-level (DTE → PTE)
drivers/iommu/rockchip-iommu.c:651
msm_iommu msm_iommu_iova_to_phys :709 Hardware walk: writes VA to
drivers/iommu/msm_iommu.c:526 V2PPR register, reads PA from
PAR register
s390-iommu s390_iommu_iova_to_phys :1186 3-level ZPCI (region → segment
drivers/iommu/s390-iommu.c:989 → page)
tegra-smmu tegra_smmu_iova_to_phys :1010 2-level via
drivers/iommu/tegra-smmu.c:806 tegra_smmu_pte_lookup
mtk_iommu_v1 mtk_iommu_v1_iova_to_phys :593 Flat single-level table
drivers/iommu/mtk_iommu_v1.c:396
sprd-iommu sprd_iommu_iova_to_phys :423 Flat single-level table
drivers/iommu/sprd-iommu.c:369
---------------------------------------------------------------------------------------------------------
Category 3 — Special / trivial
-------------------------------------------------------------------------------------------
Driver Function (file:line) Ops assignment Mechanism
-------------- ------------------------------------- ---------------------- ---------------
fsl_pamu fsl_pamu_iova_to_phys :438 Identity:
drivers/iommu/fsl_pamu_domain.c:172 returns iova
(after aperture
bounds check)
virtio-iommu viommu_iova_to_phys :1105 Interval tree
drivers/iommu/virtio-iommu.c:915 reverse lookup
(no page table)
-------------------------------------------------------------------------------------------
Category 4 — io_pgtable_ops backends
These implement struct io_pgtable_ops.iova_to_phys and are the ultimate walk
functions called by Category 1 drivers.
--------------------------------------------------------------------------------------------------
Backend Function (file:line) Ops assignment Walk strategy
----------- ---------------------------------------- -------------------- ------------------------
ARM LPAE arm_lpae_iova_to_phys :950 Visitor pattern via
(64-bit) drivers/iommu/io-pgtable-arm.c:734 __arm_lpae_iopte_walk;
covers ARM_64_LPAE_S1,
S2, ARM_MALI_LPAE
ARM v7s arm_v7s_iova_to_phys :716 Iterative do-while
(32-bit) drivers/iommu/io-pgtable-arm-v7s.c:644 2-level; handles
contiguous entries
Apple DART dart_iova_to_phys :402 dart_get_last pre-walks
drivers/iommu/io-pgtable-dart.c:336 to leaf table, then
single lookup
--------------------------------------------------------------------------------------------------
Category 5 — generic_pt framework
All these drivers use IOMMU_PT_DOMAIN_OPS(fmt) which routes iova_to_phys into
the template function pt_iommu_<fmt>_iova_to_phys at
drivers/iommu/generic_pt/iommu_pt.h:170. The walk uses pt_walk_range +
PT_MAKE_LEVELS to generate a fully-inlined unrolled per-level walk; OA
extracted via pt_entry_oa_exact.
---------------------------------------------------------------------------------
Driver Ops struct (file:line) Format
---------------- ----------------------------------------------- ----------------
AMD IOMMU v1 amdv1_ops drivers/iommu/amd/iommu.c:2662 amdv1
AMD IOMMU v2 amdv2_ops drivers/iommu/amd/iommu.c:2740 x86_64
Intel VT-d intel_fs_paging_domain_ops x86_64
first-stage drivers/iommu/intel/iommu.c:3886
Intel VT-d intel_ss_paging_domain_ops vtdss
second-stage drivers/iommu/intel/iommu.c:3897
iommufd selftest mock_domain_ops etc amdv1_mock /
drivers/iommu/iommufd/selftest.c:403,411,425 amdv1
KUnit wrapper pgtbl_ops Delegates to
drivers/iommu/generic_pt/kunit_iommu_cmp.h:86 io_pgtable_ops
for comparison
testing
---------------------------------------------------------------------------------
Sub-page offset handling
When iova_to_phys(iova) is called with an IOVA that is not aligned to the
start of the mapped page/block (e.g. iova_to_phys(1) when a 4KB page is mapped
at IOVA 0), most implementations return the exact physical address including
the sub-page offset (phys_base + offset). Two do not.
Summary
------------------------------------------------------------------------------------------------------
Implementation Offset preserved? Mechanism
------------------------- ------------------------- --------------------------------------------------
arm_lpae (io-pgtable) YES iopte_to_paddr(pte) | (iova & (block_size-1))
arm_v7s (io-pgtable) YES iopte_to_paddr(pte) | (iova & ~LVL_MASK)
dart (io-pgtable) YES iopte_to_paddr(pte) | (iova & (pgsize-1))
sun50i-iommu YES page_addr + FIELD_GET(GENMASK(11,0), iova)
exynos-iommu YES *_phys(entry) + *_offs(iova) per granularity
riscv-iommu YES pfn_to_phys(pfn) | (iova & (pte_size-1))
omap-iommu YES (descriptor & mask) | (va & ~mask)
rockchip-iommu YES pt_address(pte) + rk_iova_page_offset(iova)
msm_iommu YES HW PAR register + VA low bits spliced back in
s390-iommu NO pte & ZPCI_PTE_ADDR_MASK — offset discarded
tegra-smmu YES SMMU_PFN_PHYS(pfn) + SMMU_OFFSET_IN_PAGE(iova)
mtk_iommu_v1 NO pte & ~(page_size-1) — offset discarded
sprd-iommu YES (pte << PAGE_SHIFT) + (iova & (page_size-1))
fsl_pamu YES (trivial) return iova — identity mapping
virtio-iommu YES paddr + (iova - mapping->iova.start)
generic_pt YES _pt_entry_oa_fast() | log2_mod(va, entry_lg2sz)
------------------------------------------------------------------------------------------------------
Category 1 drivers (arm-smmu-v3, arm-smmu, apple-dart, qcom_iommu, ipmmu-vmsa,
mtk_iommu) inherit the behavior of their io-pgtable backend — all preserve
offset.
Implementations that lose the offset
s390-iommu (drivers/iommu/s390-iommu.c:989): After the 3-level ZPCI walk,
returns pte & ZPCI_PTE_ADDR_MASK with no sub-page offset added back.
iova_to_phys(0) and iova_to_phys(1) return the same page-aligned PA.
mtk_iommu_v1 (drivers/iommu/mtk_iommu_v1.c:396): Looks up the PTE by
iova >> PAGE_SHIFT (discarding offset), then returns pte & ~(page_size-1). No
step adds the sub-page offset back.
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index c8d8eff5373d30..8db16989270cd8 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -401,7 +401,8 @@ static phys_addr_t mtk_iommu_v1_iova_to_phys(struct iommu_domain *domain, dma_ad
spin_lock_irqsave(&dom->pgtlock, flags);
pa = *(dom->pgt_va + (iova >> MT2701_IOMMU_PAGE_SHIFT));
- pa = pa & (~(MT2701_IOMMU_PAGE_SIZE - 1));
+ pa = (pa & (~(MT2701_IOMMU_PAGE_SIZE - 1))) |
+ (iova & (MT2701_IOMMU_PAGE_SIZE - 1));
spin_unlock_irqrestore(&dom->pgtlock, flags);
return pa;
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index fe679850af2861..57d27f3a984ed6 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -1015,7 +1015,8 @@ static phys_addr_t s390_iommu_iova_to_phys(struct iommu_domain *domain,
pto = get_st_pto(ste);
pte = READ_ONCE(pto[px]);
if (pt_entry_isvalid(pte))
- phys = pte & ZPCI_PTE_ADDR_MASK;
+ phys = (pte & ZPCI_PTE_ADDR_MASK) |
+ (iova & ~ZPCI_PTE_ADDR_MASK);
}
}