Re: [PATCH rc] iommu/arm-smmu-v3: Drain in-flight fault handlers
From: Will Deacon
Date: Tue Mar 24 2026 - 10:14:54 EST
On Thu, Mar 12, 2026 at 11:25:09AM -0300, Jason Gunthorpe wrote:
> On Thu, Mar 12, 2026 at 01:51:26PM +0000, Will Deacon wrote:
> > On Fri, Mar 06, 2026 at 04:17:23PM -0800, Nicolin Chen wrote:
> > > From: Malak Marrid <mmarrid@xxxxxxxxxx>
> > >
> > > When a device is switching away from a domain, either through a detach or a
> > > replace operation, it must drain its IOPF queue that only contains the page
> > > requests for the old domain.
> > >
> > > Currently, the IOPF infrastructure is used by master->stall_enabled. So the
> > > stalled transaction for the old domain should be resumed/terminated. Fix it
> > > properly.
> > >
> > > Fixes: cfea71aea921 ("iommu/arm-smmu-v3: Put iopf enablement in the domain attach path")
> > > Cc: stable@xxxxxxxxxxxxxxx
> > > Co-developed-by: Barak Biber <bbiber@xxxxxxxxxx>
> > > Signed-off-by: Barak Biber <bbiber@xxxxxxxxxx>
> > > Co-developed-by: Stefan Kaestle <skaestle@xxxxxxxxxx>
> > > Signed-off-by: Stefan Kaestle <skaestle@xxxxxxxxxx>
> > > Signed-off-by: Malak Marrid <mmarrid@xxxxxxxxxx>
> > > Signed-off-by: Nicolin Chen <nicolinc@xxxxxxxxxx>
> > > ---
> > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 11 ++++++++++-
> > > 1 file changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > index 4d00d796f0783..2176ee8bec767 100644
> > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > @@ -2843,6 +2843,12 @@ static int arm_smmu_enable_iopf(struct arm_smmu_master *master,
> > > if (master->iopf_refcount) {
> > > master->iopf_refcount++;
> > > master_domain->using_iopf = true;
> > > + /*
> > > + * If the device is already on the IOPF queue (domain replace),
> > > + * drain in-flight fault handlers so nothing will hold the old
> > > + * domain when the core switches the attach handle.
> > > + */
> > > + iopf_queue_flush_dev(master->dev);
> >
> > So this drains the iopf workqueue, but don't you still have a race with
> > the hardware generating a fault on the old domain and then that only
> > showing up once you've switched to the new one? What is the actual
> > problem you're trying to solve with this patch?
>
> HW doesn't generate faults on domains, it calls
> iommu_report_device_fault() which calls find_fault_handler() that
> uses iommu_attach_handle_get() to find the domain. It then shoves the
> domain pointer onto a WQ.
Sorry, that was sloppy terminology on my part. I'm trying to reason about
faults that are generated by accesses that were translated with the
page-tables of the old domain being reported once we think we are using
the new domain.
> The ordering is supposed to be
> 1) IOMMU HW starts using the new domain
> 2) iommu_attach_handle_get() returns the new domain
> 3) IOMMU driver flushes its own IRQs/queues that may be concurrently
> calling iommu_attach_handle_get()
Does that mean we should kick the evtq thread? I'm not sure what this
means for the priq.
> 4) iopf_queue_flush_dev() to clear the iopf work queue
> 5) domain is freed, no pointers in WQs or other threads
>
> So the naked iopf_queue_flush_dev() doesn't seem right, I'd expect a
> synchronize_irq() (is that right for threaded IRQs?) too as the
> threaded IRQ is concurrently calling iommu_attach_handle_get().
I don't think we can rely on the IRQ being taken, though, so presumably
we have to kick the irq thread manually and see what's actually sitting
in the event queue after the CMD_SYNC?
Will