Re: [PATCH RFC] iommu: fix wrong DMA ops for iommu device

From: Robin Murphy
Date: Fri Jan 24 2025 - 09:17:52 EST


On 2025-01-24 12:32 am, Charan Teja Kalla wrote:


On 1/22/2025 8:13 PM, Robin Murphy wrote:
Please lmk If I can submit this as V2 patch, with your name as
Suggested-by?
I missed to mention that this patch is fixing the problem we are
seeing..Thanks.

Ah, that's good. I did spend most of an afternoon digging through git
history to write up this patch the other day, and was planning to send
it properly soon (probably after the merge window now) - as noted, it
also turns out not to be quite right for 6.6LTS as-is, so I figured I'd
save worrying about backports until it lands.

Thanks a lot Robin for sending this patch with much more details!!


Thanks,
Robin.

----->8-----
Subject: [PATCH] iommu: Handle race with default domain setup

It turns out that deferred default domain creation leaves a subtle
race window during iommu_device_register() wherein a client driver may
asynchronously probe in parallel and get as far as performing DMA API
operations with dma-direct, only to be switched to iommu-dma underfoot
once the default domain attachment finally happens, with obviously
disastrous consequences. Even the wonky of_iommu_configure() path is at
risk, since iommu_fwspec_init() will no longer defer client probe as the
instance ops are (necessarily) already registered, and the "replay"
iommu_probe_device() call can see dev->iommu_group already set and so
think there's nothing to do either.

Fortunately we already have the right tool in the right place in the
form of iommu_device_use_default_domain(), which just needs to ensure
that said default domain is actually ready to *be* used. Deferring the
client probe shouldn't have too much impact, given that this only
happens while the IOMMU driver is probing, and thus due to kick the
deferred probe list again once it finishes.

Reported-by: Charan Teja Kalla <quic_charante@xxxxxxxxxxx>
Fixes: 98ac73f99bc4 ("iommu: Require a default_domain for all iommu
drivers")
Signed-off-by: Robin Murphy <robin.murphy@xxxxxxx>

Do you think If the below tag can be added?
Closes:
https://lore.kernel.org/all/20241213150415.653821-1-quic_charante@xxxxxxxxxxx/

Sure, I guess we don't have a pedantic "Will-close-once-backported:" tag and it'll shut checkpatch up at least :)

---

Note this fixes tag is rather nuanced - historically there was a more
general issue before deac0b3bed26 ("iommu: Split off default domain
allocation from group assignment") set the basis for the current
conditions; 1ea2a07a532b ("iommu: Add DMA ownership management
interfaces") is then the point at which it becomes logical to fix the
current race this way; however only from 98ac73f99bc4 can we rely on all
drivers supporting default domains and so avoid false negatives, thus
even though this might apply to older kernels without conflict it would
not be functionally correct. Furthermore given the other locking and API
flow changes which also happened over that time, it's not clear how long
this race has actually been exposed. LTS fixes are going to be fiddly...

Thanks again for sharing such a detailed history.

My understanding is that the commit f188056352bc ("iommu: Avoid
locking/unlocking for iommu_probe_device()"), also sets the basis for
the current race conditions. Without this, I think, iommu_probe_device()
would always returns with default domain and later iommu_setup_dma_ops()
fills the 'dev->dma_iommu' flag. Please CMIW..

And, as the commit f188056352bc ("iommu: Avoid locking/unlocking for
iommu_probe_device()") is present from 6.6 LTS, I am not sure if we
really need to look before 6.6 LTS kernels to fix this race.

I think you're probably right - if I get time I'll double-check and tweak the writeup before I resend. From a very quick look, prior to f188056352bc indeed it seems like we shouldn't get the DMA ops confusion as the consequence, but instead we're back to potentially creating the wrong default domain type if bus_iommu_probe() hasn't found all the group devices yet, which in reality is either highly unlikely or already very likely for other reasons irrespective of this particular race.

Thanks,
Robin.

---
 drivers/iommu/iommu.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 851fd5aeccf5..9969531ad4ed 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3097,6 +3097,11 @@ int iommu_device_use_default_domain(struct device
*dev)
         return 0;
     mutex_lock(&group->mutex);
+    /* We may race against bus_iommu_probe() finalising groups here */
+    if (!group->default_domain) {
+        ret = -EPROBE_DEFER;
+        goto unlock_out;
+    }
     if (group->owner_cnt) {
         if (group->domain != group->default_domain || group->owner ||
             !xa_empty(&group->pasid_array)) {