Re: [PATCH] iommu: Fix bypass of IOMMU readiness check for multi-IOMMU devices
From: Tudor Ambarus
Date: Tue Apr 14 2026 - 09:04:38 EST
Hi, Robin,
Thanks for the educative answers.
On 4/2/26 5:20 PM, Robin Murphy wrote:
> On 2026-04-02 12:59 pm, Jason Gunthorpe wrote:
>> On Thu, Apr 02, 2026 at 02:25:54PM +0300, Tudor Ambarus wrote:
>>
>>> I can probably track whether all instances are ready, and defer if any
>>> is not ready, but then I'll force the iommu clients to use the sketchy
>>> replay path, which seems like a bad idea, according to Robin's feedback.
>>
>> I didn't think that was sketchy, it is part of the boot ordering
>> system to ensure that the iommu driver(s) is probed before the client
>> devices.
>>
>> Half operating a device is definately going to get things into trouble
>> with broken/incomplete domain attachments at least.
>
> The Exynos driver itself is actually fine, and doing everything right. We'll never have a "half-configured" client device in IOMMU API terms currently - only once both instances are registered such that both of_xlate calls can succeed (one for each specifier in the client device's "iommus" property) will we proceed to calling probe_device, which will then work as normal.
I assume this is true because of core_initcall(exynos_iommu_init);
Or is it something else that guarantees that both the IOMMU instances
are registered at exynos_iommu_of_xlate() time? The instance's drvdata
is set before the instance registers, and the rest of the code in
exynos_iommu_of_xlate constructs the client's list of iommu instances.
>
> The issue here is purely in the race-avoidance scheme within of_iommu_configure() itself, which hasn't accounted for the fact that when it's looping over multiple specifiers, they don't necessarily all target the same IOMMU node. And it's only during a window where the instance targeted by the first specifier happens to be registered already, and the second is currently in the middle of registering.
Allow me to give an example. I added some extra prints to catch things.
I attached a diff if you want to see exactly where. I stripped a bit
the log so that it's easier to follow. I added comments.
19471000.drmdecon describe 19840000.sysmmu and 19c40000.sysmmu in its
"iommus" dt property.
[ 2.308076][ T1] samsung-sysmmu-v9 19840000.sysmmu: tudor: samsung_sysmmu_device_probe: after iommu_device_register
^ first instance registered
[ 2.310095][ T86] platform 19471000.drmdecon: tudor: really_probe: enter for driver exynos-decon
^ client in really_probe()
[ 2.311634][ T1] samsung-sysmmu-v9 19c40000.sysmmu: tudor: samsung_sysmmu_device_probe: before iommu_device_register
^ second instance right before calling iommu_device_register()
[ 2.312358][ T86] exynos-decon 19471000.drmdecon: tudor: really_probe: after setting dev->driver exynos-decon
^ client set dev->driver!
[ 2.348787][ T86] exynos-decon 19471000.drmdecon: tudor: of_iommu_configure: of_iommu_configure_device err = 0, dev_iommu_present = 0
^ of_xlate succeeded for both instances
We're in the client's really_probe() path. Shall of_xlate fail if the iommu instance is not registered yet?
[ 2.349277][ T86] exynos-decon 19471000.drmdecon: tudor: of_iommu_configure: call iommu_probe_device new dev_iommu_present = 1, dev->iommu = 000000007deaca98
^ initial dev->iommu was not set, we prepare to call iommu_probe_device() even if dev->iommu was set in the of_iommu_configure_device() call
The thread is going to sleep now, it can't acquire iommu_probe_device_lock
[ 2.358261][ T1] exynos-decon 19471000.drmdecon: tudor: iommu_init_device: bus->dma_configure not called dev->iommu = 000000007deaca98, dev->iommu_group = 0000000000000000, dev->iommu->fwspec = 00000000896797d4 dev->driver = 000000008821e7b2
^ inside iommu_init_device(), dev->bus->dma_configure(dev) is NOT called. Then it calls ops->probe_device() and ops->device_group(), succeeding.
[ 2.360561][ T1] ------------[ cut here ]------------
[ 2.360729][ T1] exynos-decon 19471000.drmdecon: late IOMMU probe at driver bind, something fishy here!
[ 2.361153][ T1] WARNING: drivers/iommu/iommu.c:650 at __iommu_probe_device+0x3a0/0x608, CPU#7: init/1
[ 2.369032][ T1] CPU: 7 UID: 0 PID: 1 Comm: init Tainted: G S O 6.19.0-mainline-g2887d9327e73-dirty #1 PREEMPT 12ccb00dc0e1a023f9ae52bf43dc85a3c6f0cd0b
[ 2.369592][ T1] Tainted: [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE
[ 2.369848][ T1] Hardware name: ZUMA PRO KOMODO EVT 1.0 Broadcom GNSS board based on ZUMA PRO (DT)
[ 2.370205][ T1] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 2.370499][ T1] pc : __iommu_probe_device+0x3a0/0x608
[ 2.370681][ T1] lr : __iommu_probe_device+0x3a0/0x608
[ 2.370915][ T1] sp : ffffc0008005b4c0
[ 2.371043][ T1] x29: ffffc0008005b4c0 x28: ffff80001d791290 x27: 0000000000000000
[ 2.371372][ T1] x26: 000000000000001d x25: ffffc0008005b598 x24: ffffe42571f2dd08
[ 2.371674][ T1] x23: ffff80001e00e490 x22: ffff800005744e00 x21: ffffe4256fd687e0
[ 2.371976][ T1] x20: ffffc0008005b598 x19: ffff800005b79810 x18: ffffe425731e13c0
[ 2.372277][ T1] x17: 000000008c623181 x16: 000000008c623181 x15: 0000000000000000
[ 2.372553][ T1] x14: 0000000000000003 x13: ffff800a7eef8000 x12: 0000000000000003
[ 2.372881][ T1] x11: 0000000000000003 x10: 00000000ffff7fff x9 : 90cf001dbb0ee800
[ 2.373184][ T1] x8 : 90cf001dbb0ee800 x7 : ffffe42571d610b4 x6 : 0000000000000000
[ 2.373486][ T1] x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffffc0008005b210
[ 2.373788][ T1] x2 : 0000000000000dba x1 : 0000000000000000 x0 : 0000000000000056
[ 2.374089][ T1] Call trace:
[ 2.374185][ T1] __iommu_probe_device+0x3a0/0x608 (P)
[ 2.374419][ T1] probe_iommu_group+0x3c/0x68
[ 2.374570][ T1] bus_for_each_dev+0x104/0x160
[ 2.374752][ T1] iommu_device_register+0xe0/0x274
[ 2.374947][ T1] samsung_sysmmu_device_probe+0x720/0x938 [samsung_iommu_v9 756cd85ee7308d7c98df18fc7d1a25a4db1266e9]
[ 2.375392][ T1] platform_probe+0x74/0xb8
[ 2.375533][ T1] really_probe+0x1a4/0x50c
[ 2.375702][ T1] __driver_probe_device+0x98/0xd4
[ 2.375892][ T1] driver_probe_device+0x3c/0x220
[ 2.376080][ T1] __driver_attach+0x118/0x1f8
[ 2.376258][ T1] bus_for_each_dev+0x104/0x160
[ 2.376440][ T1] driver_attach+0x24/0x34
[ 2.376603][ T1] bus_add_driver+0x140/0x2ac
[ 2.376779][ T1] driver_register+0x68/0x104
[ 2.376953][ T1] __platform_driver_register+0x20/0x30
[ 2.377186][ T1] init_module+0x20/0x3fd8 [samsung_iommu_v9 756cd85ee7308d7c98df18fc7d1a25a4db1266e9]
[ 2.377524][ T1] do_one_initcall+0x100/0x3ac
[ 2.377702][ T1] do_init_module+0x58/0x264
[ 2.377872][ T1] load_module+0x1244/0x1290
[ 2.378047][ T1] __arm64_sys_finit_module+0x248/0x334
[ 2.378280][ T1] invoke_syscall+0x58/0xe4
[ 2.378423][ T1] el0_svc_common+0x8c/0xd8
[ 2.378590][ T1] do_el0_svc+0x1c/0x28
[ 2.378745][ T1] el0_svc+0x54/0x1c4
[ 2.378892][ T1] el0t_64_sync_handler+0x84/0x12c
[ 2.379084][ T1] el0t_64_sync+0x1c4/0x1c8
[ 2.379252][ T1] irq event stamp: 822284
[ 2.379413][ T1] hardirqs last enabled at (822283): [<ffffe42570bc7dc0>] __console_unlock+0x64/0xbc
[ 2.379800][ T1] hardirqs last disabled at (822284): [<ffffe42571d52748>] el1_brk64+0x20/0x54
[ 2.380139][ T1] softirqs last enabled at (822092): [<ffffe42570b02efc>] handle_softirqs+0x484/0x504
[ 2.380505][ T1] softirqs last disabled at (822085): [<ffffe42570a10518>] __do_softirq+0x14/0x20
[ 2.380854][ T1] ---[ end trace 0000000000000000 ]---
[ 2.381058][ T1] exynos-decon 19471000.drmdecon: Adding to iommu group 1
[ 2.382126][ T86] exynos-decon 19471000.drmdecon: tudor: __iommu_probe_device: enter
[ 2.382420][ T86] exynos-decon 19471000.drmdecon: tudor: of_iommu_configure: after iommu_probe_device err = 0
[ 2.382734][ T86] exynos-decon 19471000.drmdecon: tudor: iommu_device_use_default_domain: we hit !group->default_domain
^ returns -EPROBE_DEFER
[ 2.383125][ T86] exynos-decon 19471000.drmdecon: tudor: really_probe: after dma_configure exynos-decon ret = -517
[ 2.383445][ T86] exynos-decon 19471000.drmdecon: tudor: really_probe: clear dev->driver exynos-decon
[
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index bea8da5f8a3a..efa1634214e6 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -609,6 +609,9 @@ static int really_probe(struct device *dev, const struct device_driver *drv)
bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
!drv->suppress_bind_attrs;
int ret, link_ret;
+ bool is_decon = (dev_name(dev) && strstr(dev_name(dev), "drmdecon"));
+
+ dev_err(dev, "tudor: %s enter for driver %s\n", __func__, drv->name);
if (defer_all_probes) {
/*
@@ -616,11 +619,14 @@ static int really_probe(struct device *dev, const struct device_driver *drv)
* device_block_probing() which, in turn, will call
* wait_for_device_probe() right after that to avoid any races.
*/
- dev_dbg(dev, "Driver %s force probe deferral\n", drv->name);
+ dev_err(dev, "Driver %s force probe deferral\n", drv->name);
return -EPROBE_DEFER;
}
link_ret = device_links_check_suppliers(dev);
+ if (is_decon)
+ dev_err(dev, "tudor: %s device_links_check_suppliers() ret %d\n",
+ __func__, link_ret);
if (link_ret == -EPROBE_DEFER)
return link_ret;
@@ -633,7 +639,13 @@ static int really_probe(struct device *dev, const struct device_driver *drv)
}
re_probe:
+ if (is_decon)
+ dev_err(dev, "tudor: %s before setting dev->driver %s\n",
+ __func__, drv->name);
device_set_driver(dev, drv);
+ if (is_decon)
+ dev_err(dev, "tudor: %s after setting dev->driver %s\n",
+ __func__, drv->name);
/* If using pinctrl, bind pins now before probing */
ret = pinctrl_bind_pins(dev);
@@ -641,7 +653,13 @@ static int really_probe(struct device *dev, const struct device_driver *drv)
goto pinctrl_bind_failed;
if (dev->bus->dma_configure) {
+ if (is_decon)
+ dev_err(dev, "tudor: %s before dma_configure %s\n",
+ __func__, drv->name);
ret = dev->bus->dma_configure(dev);
+ if (is_decon)
+ dev_err(dev, "tudor: %s after dma_configure %s ret = %d\n",
+ __func__, drv->name, ret);
if (ret)
goto pinctrl_bind_failed;
}
@@ -723,6 +741,10 @@ static int really_probe(struct device *dev, const struct device_driver *drv)
if (dev->bus && dev->bus->dma_cleanup)
dev->bus->dma_cleanup(dev);
pinctrl_bind_failed:
+
+ if (is_decon)
+ dev_err(dev, "tudor: %s clear dev->driver %s\n",
+ __func__, drv->name);
device_links_no_driver(dev);
device_unbind_cleanup(dev);
done:
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 4926a43118e6..35042965b0f7 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -450,6 +450,11 @@ static int iommu_init_device(struct device *dev)
struct iommu_device *iommu_dev;
struct iommu_group *group;
int ret;
+ bool is_decon = (dev_name(dev) && strstr(dev_name(dev), "drmdecon"));
+
+ if (is_decon)
+ dev_err(dev, "tudor: %s, dev->iommu = %p, dev->iommu_group = %p\n",
+ __func__, dev->iommu, dev->iommu_group);
if (!dev_iommu_get(dev))
return -ENOMEM;
@@ -464,10 +469,19 @@ static int iommu_init_device(struct device *dev)
mutex_unlock(&iommu_probe_device_lock);
dev->bus->dma_configure(dev);
mutex_lock(&iommu_probe_device_lock);
+ if (is_decon)
+ dev_err(dev, "tudor: %s, after bus->dma_configure(): dev->iommu = %p, dev->iommu_group = %p\n",
+ __func__, dev->iommu, dev->iommu_group);
/* If another instance finished the job for us, skip it */
if (!dev->iommu || dev->iommu_group)
return -ENODEV;
+ } else {
+ if (is_decon)
+ dev_err(dev, "tudor: %s, bus->dma_configure not called dev->iommu = %p, dev->iommu_group = %p, dev->iommu->fwspec = %p dev->driver = %p\n",
+ __func__, dev->iommu, dev->iommu_group, dev->iommu->fwspec, dev->driver);
}
+
+
/*
* At this point, relevant devices either now have a fwspec which will
* match ops registered with a non-NULL fwnode, or we can reasonably
@@ -608,7 +622,10 @@ static int __iommu_probe_device(struct device *dev, struct list_head *group_list
struct iommu_group *group;
struct group_device *gdev;
int ret;
+ bool is_decon = (dev_name(dev) && strstr(dev_name(dev), "drmdecon"));
+ if (is_decon)
+ dev_err(dev, "tudor: %s enter\n", __func__);
/*
* Serialise to avoid races between IOMMU drivers registering in
* parallel and/or the "replay" calls from ACPI/OF code via client
@@ -3241,6 +3258,7 @@ int iommu_device_use_default_domain(struct device *dev)
/* Caller is the driver core during the pre-probe path */
struct iommu_group *group = dev->iommu_group;
int ret = 0;
+ bool is_decon = (dev_name(dev) && strstr(dev_name(dev), "drmdecon"));
if (!group)
return 0;
@@ -3248,6 +3266,9 @@ int iommu_device_use_default_domain(struct device *dev)
mutex_lock(&group->mutex);
/* We may race against bus_iommu_probe() finalising groups here */
if (!group->default_domain) {
+ if (is_decon)
+ dev_err(dev, "tudor: %s, we hit !group->default_domain\n",
+ __func__);
ret = -EPROBE_DEFER;
goto unlock_out;
}
diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 6b989a62def2..af89f1c874dc 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -118,6 +118,7 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
{
bool dev_iommu_present;
int err;
+ bool is_decon = (dev_name(dev) && strstr(dev_name(dev), "drmdecon"));
if (!master_np)
return -ENODEV;
@@ -155,13 +156,25 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
dev_iommu_free(dev);
mutex_unlock(&iommu_probe_device_lock);
+ if (is_decon)
+ dev_err(dev, "tudor: %s: of_iommu_configure_device err = %d, dev_iommu_present = %d\n",
+ __func__, err, dev_iommu_present);
/*
* If we're not on the iommu_probe_device() path (as indicated by the
* initial dev->iommu) then try to simulate it. This should no longer
* happen unless of_dma_configure() is being misused outside bus code.
*/
- if (!err && dev->bus && !dev_iommu_present)
+ if (!err && dev->bus && !dev_iommu_present) {
+ if (is_decon) {
+ dev_iommu_present = dev->iommu;
+ dev_err(dev, "tudor: %s call iommu_probe_device new dev_iommu_present = %d, dev->iommu = %p\n",
+ __func__, dev_iommu_present, dev->iommu);
+ }
err = iommu_probe_device(dev);
+ if (is_decon)
+ dev_err(dev, "tudor: %s after iommu_probe_device err = %d\n",
+ __func__, err);
+ }
if (err && err != -EPROBE_DEFER)
dev_dbg(dev, "Adding to IOMMU failed: %d\n", err);