Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe path

From: Robin Murphy
Date: Mon Mar 17 2025 - 14:22:59 EST


On 17/03/2025 7:37 am, Marek Szyprowski wrote:
On 13.03.2025 15:12, Robin Murphy wrote:
On 2025-03-13 1:06 pm, Robin Murphy wrote:
On 2025-03-13 12:23 pm, Marek Szyprowski wrote:
On 13.03.2025 12:01, Robin Murphy wrote:
On 2025-03-13 9:56 am, Marek Szyprowski wrote:
[...]
This patch landed in yesterday's linux-next as commit bcb81ac6ae3c
("iommu: Get DT/ACPI parsing into the proper probe path"). In my
tests I
found it breaks booting of ARM64 RK3568-based Odroid-M1 board
(arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts). Here is the
relevant kernel log:

...and the bug-flushing-out begins!

Unable to handle kernel NULL pointer dereference at virtual address
00000000000003e8
Mem abort info:
     ESR = 0x0000000096000004
     EC = 0x25: DABT (current EL), IL = 32 bits
     SET = 0, FnV = 0
     EA = 0, S1PTW = 0
     FSC = 0x04: level 0 translation fault
Data abort info:
     ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
     CM = 0, WnR = 0, TnD = 0, TagAccess = 0
     GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[00000000000003e8] user address but active_mm is swapper
Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #15533
Hardware name: Hardkernel ODROID-M1 (DT)
pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : devm_kmalloc+0x2c/0x114
lr : rk_iommu_of_xlate+0x30/0x90
...
Call trace:
    devm_kmalloc+0x2c/0x114 (P)
    rk_iommu_of_xlate+0x30/0x90

Yeah, looks like this is doing something a bit questionable which
can't
work properly. TBH the whole dma_dev thing could probably be
cleaned up
now that we have proper instances, but for now does this work?

Yes, this patch fixes the problem I've observed.

Reported-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>

BTW, this dma_dev idea has been borrowed from my exynos_iommu driver
and
I doubt it can be cleaned up.

On the contrary I suspect they both can - it all dates back to when
we had the single global platform bus iommu_ops and the SoC drivers
were forced to bodge their own notion of multiple instances, but with
the modern core code, ops are always called via a valid IOMMU
instance or domain, so in principle it should always be possible to
get at an appropriate IOMMU device now. IIRC it was mostly about
allocating and DMA-mapping the pagetables in domain_alloc, where the
private notion of instances didn't have enough information, but
domain_alloc_paging solves that.

Bah, in fact I think I am going to have to do that now, since although
it doesn't crash, rk_domain_alloc_paging() will also be failing for
the same reason. Time to find a PSU for the RK3399 board, I guess...

(Or maybe just move the dma_dev assignment earlier to match Exynos?)

Well I just found that Exynos IOMMU is also broken on some on my test
boards. It looks that the runtime pm links are somehow not correctly
established. I will try to analyze this later in the afternoon.

Hmm, I tried to get an Odroid-XU3 up and running, but it seems unable to boot my original 6.14-rc3-based branch - even with the IOMMU driver disabled, it's consistently dying somewhere near (or just after) init with what looks like some catastrophic memory corruption issue - very occasionally it's managed to print the first line of various different panics.

Before that point though, with the IOMMU driver enabled it does appear to show signs of working OK:

[ 0.649703] exynos-sysmmu 14650000.sysmmu: hardware version: 3.3
[ 0.654220] platform 14450000.mixer: Adding to iommu group 1
...
[ 2.680920] exynos-mixer 14450000.mixer: exynos_iommu_attach_device: Attached IOMMU with pgtable 0x42924000
...
[ 5.196674] exynos-mixer 14450000.mixer: exynos_iommu_identity_attach: Restored IOMMU to IDENTITY from pgtable 0x42924000
[ 5.207091] exynos-mixer 14450000.mixer: exynos_iommu_attach_device: Attached IOMMU with pgtable 0x42884000


The multi-instance stuff in probe/release does look a bit suspect, however - seems like the second instance probe would overwrite the first instance's links, and then there would be a double-del() if the device were ever actually released again? I may have made that much more likely to happen, but I suspect it was already possible with async driver probe...

Thanks,
Robin.