Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists

From: Jason Gunthorpe

Date: Thu Feb 26 2026 - 20:03:32 EST


On Thu, Feb 26, 2026 at 09:40:10PM +0100, Antheas Kapenekakis wrote:
> I am still concerned about unaligned checks. It is a functional change
> that can cause regressions in all devices. The approach of this patch
> does not affect behavior in other devices. I would like for Jason to
> weigh in.

I think Robin's solution is very clever, but I share the concern
regarding what all the implementations do.

So, I fed this question to Claude. It did find two counter points (see
below for the whole report I had it generate):

Implementations that lose the offset

s390-iommu (drivers/iommu/s390-iommu.c:989): After the 3-level ZPCI walk,
returns pte & ZPCI_PTE_ADDR_MASK with no sub-page offset added back.
iova_to_phys(0) and iova_to_phys(1) return the same page-aligned PA.

mtk_iommu_v1 (drivers/iommu/mtk_iommu_v1.c:396): Looks up the PTE by
iova >> PAGE_SHIFT (discarding offset), then returns pte & ~(page_size-1). No
step adds the sub-page offset back.

I checked myself and it seems correct. I didn't try to confirm that
the cases it says are OK are in fact OK, but it paints a convincing
picture.

I doubt S390 uses this function you are fixing, and I have no idea
about mtk. Below is also a diff how Claude thought to fix it, I didn't
try to check it.

So, I'd say if Robin is OK with these outliers then it a good and fine
approach.

Jason

iova_to_phys Implementation Survey

Entry point: iommu_iova_to_phys() in drivers/iommu/iommu.c:2502 calls
domain->ops->iova_to_phys(domain, iova) via iommu_domain_ops.

Category 1 — Delegates to io-pgtable

These drivers hold an io_pgtable_ops * and call ops->iova_to_phys(ops, iova).
The actual walk happens in one of the io-pgtable backends listed in Category
4.

----------------------------------------------------------------------------------------------------------
Driver Function (file:line) Ops assignment Notes
------------- -------------------------------------------------- ----------------------- -----------------
arm-smmu-v3 arm_smmu_iova_to_phys :3767 Pure delegation
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3471

arm-smmu arm_smmu_iova_to_phys :1655 S1 with
v1/v2 drivers/iommu/arm/arm-smmu/arm-smmu.c:1387 FEAT_TRANS_OPS
uses hw ATS1PR
registers
(CB_PAR),
otherwise
io-pgtable

apple-dart apple_dart_iova_to_phys :1021 Pure delegation →
drivers/iommu/apple-dart.c:531 io-pgtable-dart

qcom_iommu qcom_iommu_iova_to_phys :605 Delegation with
drivers/iommu/arm/arm-smmu/qcom_iommu.c:492 spinlock

ipmmu-vmsa ipmmu_iova_to_phys drivers/iommu/ipmmu-vmsa.c:702 :895 Uses
ARM_32_LPAE_S1
format

mtk_iommu mtk_iommu_iova_to_phys :1073 Delegation + 4GB
drivers/iommu/mtk_iommu.c:861 mode PA remap
fixup
----------------------------------------------------------------------------------------------------------

Category 2 — Open-coded page table walk

These drivers implement their own page table traversal without io-pgtable.

---------------------------------------------------------------------------------------------------------
Driver Function (file:line) Ops assignment Walk structure
---------------- ------------------------------------ -------------------- ------------------------------
sun50i-iommu sun50i_iommu_iova_to_phys :860 2-level (DTE → PTE)
drivers/iommu/sun50i-iommu.c:662

exynos-iommu exynos_iommu_iova_to_phys :1487 2-level (section/large/small
drivers/iommu/exynos-iommu.c:1375 page)

riscv-iommu riscv_iommu_iova_to_phys :1355 Sv39/48/57 via
drivers/iommu/riscv/iommu.c:1280 riscv_iommu_pte_fetch (:1166)

omap-iommu omap_iommu_iova_to_phys :1727 iopgtable_lookup_entry helper
drivers/iommu/omap-iommu.c:1596 (super
section/section/large/small)

rockchip-iommu rk_iommu_iova_to_phys :1190 2-level (DTE → PTE)
drivers/iommu/rockchip-iommu.c:651

msm_iommu msm_iommu_iova_to_phys :709 Hardware walk: writes VA to
drivers/iommu/msm_iommu.c:526 V2PPR register, reads PA from
PAR register

s390-iommu s390_iommu_iova_to_phys :1186 3-level ZPCI (region → segment
drivers/iommu/s390-iommu.c:989 → page)

tegra-smmu tegra_smmu_iova_to_phys :1010 2-level via
drivers/iommu/tegra-smmu.c:806 tegra_smmu_pte_lookup

mtk_iommu_v1 mtk_iommu_v1_iova_to_phys :593 Flat single-level table
drivers/iommu/mtk_iommu_v1.c:396

sprd-iommu sprd_iommu_iova_to_phys :423 Flat single-level table
drivers/iommu/sprd-iommu.c:369
---------------------------------------------------------------------------------------------------------

Category 3 — Special / trivial

-------------------------------------------------------------------------------------------
Driver Function (file:line) Ops assignment Mechanism
-------------- ------------------------------------- ---------------------- ---------------
fsl_pamu fsl_pamu_iova_to_phys :438 Identity:
drivers/iommu/fsl_pamu_domain.c:172 returns iova
(after aperture
bounds check)

virtio-iommu viommu_iova_to_phys :1105 Interval tree
drivers/iommu/virtio-iommu.c:915 reverse lookup
(no page table)
-------------------------------------------------------------------------------------------

Category 4 — io_pgtable_ops backends

These implement struct io_pgtable_ops.iova_to_phys and are the ultimate walk
functions called by Category 1 drivers.

--------------------------------------------------------------------------------------------------
Backend Function (file:line) Ops assignment Walk strategy
----------- ---------------------------------------- -------------------- ------------------------
ARM LPAE arm_lpae_iova_to_phys :950 Visitor pattern via
(64-bit) drivers/iommu/io-pgtable-arm.c:734 __arm_lpae_iopte_walk;
covers ARM_64_LPAE_S1,
S2, ARM_MALI_LPAE

ARM v7s arm_v7s_iova_to_phys :716 Iterative do-while
(32-bit) drivers/iommu/io-pgtable-arm-v7s.c:644 2-level; handles
contiguous entries

Apple DART dart_iova_to_phys :402 dart_get_last pre-walks
drivers/iommu/io-pgtable-dart.c:336 to leaf table, then
single lookup
--------------------------------------------------------------------------------------------------

Category 5 — generic_pt framework

All these drivers use IOMMU_PT_DOMAIN_OPS(fmt) which routes iova_to_phys into
the template function pt_iommu_<fmt>_iova_to_phys at
drivers/iommu/generic_pt/iommu_pt.h:170. The walk uses pt_walk_range +
PT_MAKE_LEVELS to generate a fully-inlined unrolled per-level walk; OA
extracted via pt_entry_oa_exact.

---------------------------------------------------------------------------------
Driver Ops struct (file:line) Format
---------------- ----------------------------------------------- ----------------
AMD IOMMU v1 amdv1_ops drivers/iommu/amd/iommu.c:2662 amdv1

AMD IOMMU v2 amdv2_ops drivers/iommu/amd/iommu.c:2740 x86_64

Intel VT-d intel_fs_paging_domain_ops x86_64
first-stage drivers/iommu/intel/iommu.c:3886

Intel VT-d intel_ss_paging_domain_ops vtdss
second-stage drivers/iommu/intel/iommu.c:3897

iommufd selftest mock_domain_ops etc amdv1_mock /
drivers/iommu/iommufd/selftest.c:403,411,425 amdv1

KUnit wrapper pgtbl_ops Delegates to
drivers/iommu/generic_pt/kunit_iommu_cmp.h:86 io_pgtable_ops
for comparison
testing
---------------------------------------------------------------------------------

Sub-page offset handling

When iova_to_phys(iova) is called with an IOVA that is not aligned to the
start of the mapped page/block (e.g. iova_to_phys(1) when a 4KB page is mapped
at IOVA 0), most implementations return the exact physical address including
the sub-page offset (phys_base + offset). Two do not.

Summary

------------------------------------------------------------------------------------------------------
Implementation Offset preserved? Mechanism
------------------------- ------------------------- --------------------------------------------------
arm_lpae (io-pgtable) YES iopte_to_paddr(pte) | (iova & (block_size-1))

arm_v7s (io-pgtable) YES iopte_to_paddr(pte) | (iova & ~LVL_MASK)

dart (io-pgtable) YES iopte_to_paddr(pte) | (iova & (pgsize-1))

sun50i-iommu YES page_addr + FIELD_GET(GENMASK(11,0), iova)

exynos-iommu YES *_phys(entry) + *_offs(iova) per granularity

riscv-iommu YES pfn_to_phys(pfn) | (iova & (pte_size-1))

omap-iommu YES (descriptor & mask) | (va & ~mask)

rockchip-iommu YES pt_address(pte) + rk_iova_page_offset(iova)

msm_iommu YES HW PAR register + VA low bits spliced back in

s390-iommu NO pte & ZPCI_PTE_ADDR_MASK — offset discarded

tegra-smmu YES SMMU_PFN_PHYS(pfn) + SMMU_OFFSET_IN_PAGE(iova)

mtk_iommu_v1 NO pte & ~(page_size-1) — offset discarded

sprd-iommu YES (pte << PAGE_SHIFT) + (iova & (page_size-1))

fsl_pamu YES (trivial) return iova — identity mapping

virtio-iommu YES paddr + (iova - mapping->iova.start)

generic_pt YES _pt_entry_oa_fast() | log2_mod(va, entry_lg2sz)
------------------------------------------------------------------------------------------------------

Category 1 drivers (arm-smmu-v3, arm-smmu, apple-dart, qcom_iommu, ipmmu-vmsa,
mtk_iommu) inherit the behavior of their io-pgtable backend — all preserve
offset.

Implementations that lose the offset

s390-iommu (drivers/iommu/s390-iommu.c:989): After the 3-level ZPCI walk,
returns pte & ZPCI_PTE_ADDR_MASK with no sub-page offset added back.
iova_to_phys(0) and iova_to_phys(1) return the same page-aligned PA.

mtk_iommu_v1 (drivers/iommu/mtk_iommu_v1.c:396): Looks up the PTE by
iova >> PAGE_SHIFT (discarding offset), then returns pte & ~(page_size-1). No
step adds the sub-page offset back.


diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index c8d8eff5373d30..8db16989270cd8 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -401,7 +401,8 @@ static phys_addr_t mtk_iommu_v1_iova_to_phys(struct iommu_domain *domain, dma_ad

spin_lock_irqsave(&dom->pgtlock, flags);
pa = *(dom->pgt_va + (iova >> MT2701_IOMMU_PAGE_SHIFT));
- pa = pa & (~(MT2701_IOMMU_PAGE_SIZE - 1));
+ pa = (pa & (~(MT2701_IOMMU_PAGE_SIZE - 1))) |
+ (iova & (MT2701_IOMMU_PAGE_SIZE - 1));
spin_unlock_irqrestore(&dom->pgtlock, flags);

return pa;
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index fe679850af2861..57d27f3a984ed6 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -1015,7 +1015,8 @@ static phys_addr_t s390_iommu_iova_to_phys(struct iommu_domain *domain,
pto = get_st_pto(ste);
pte = READ_ONCE(pto[px]);
if (pt_entry_isvalid(pte))
- phys = pte & ZPCI_PTE_ADDR_MASK;
+ phys = (pte & ZPCI_PTE_ADDR_MASK) |
+ (iova & ~ZPCI_PTE_ADDR_MASK);
}
}