Re: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()

From: Jason Gunthorpe
Date: Mon Dec 04 2023 - 08:27:32 EST


On Mon, Dec 04, 2023 at 05:37:13AM +0000, Tian, Kevin wrote:
> > From: Baolu Lu <baolu.lu@xxxxxxxxxxxxxxx>
> > Sent: Monday, December 4, 2023 9:33 AM
> >
> > On 12/3/23 10:14 PM, Jason Gunthorpe wrote:
> > > On Sun, Dec 03, 2023 at 04:53:08PM +0800, Baolu Lu wrote:
> > >> Even if atomic replacement were to be implemented,
> > >> it would be necessary to ensure that all translation requests,
> > >> translated requests, page requests and responses for the old domain are
> > >> drained before switching to the new domain.
> > >
> > > Again, no it isn't required.
> > >
> > > Requests simply have to continue to be acked, it doesn't matter if
> > > they are acked against the wrong domain because the device will simply
> > > re-issue them..
> >
> > Ah! I start to get your point now.
> >
> > Even a page fault response is postponed to a new address space, which
> > possibly be another address space or hardware blocking state, the
> > hardware just retries.
>
> if blocking then the device shouldn't retry.

It does retry.

The device is waiting on a PRI, it gets back an completion. It issues
a new ATS (this is the rety) and the new-domain responds back with a
failure indication.

If the new domain had a present page it would respond with a
translation

If the new domain has a non-present page then we get a new PRI.

The point is from a device perspective it is always doing something
correct.

> btw if a stale request targets an virtual address which is outside of the
> valid VMA's of the new address space then visible side-effect will
> be incurred in handle_mm_fault() on the new space. Is it desired?

The whole thing is racy, if someone is radically changing the
underlying mappings while DMA is ongoing then there is no way to
synchronize 'before' and 'after' against a concurrent external device.

So who cares?

What we care about is that the ATC is coherent and never has stale
data. The invalidation after changing the translation ensures this
regardless of any outstanding un-acked PRI.

> Or if a pending response carries an error code (Invalid Request) from
> the old address space is received by the device when the new address
> space is already activated, the hardware will report an error even
> though there might be a valid mapping in the new space.

Again, all racy. If a DMA is ongoing at the same instant things are
changed there is no definitive way to say if it resolved before or
after.

The only thing we care about is that dmas that are completed before
see the before translation and dmas that are started after see the
after translation.

DMAs that cross choose one at random.

> I don't think atomic replace is the main usage for this draining
> requirement. Instead I'm more interested in the basic popular usage:
> attach-detach-attach and not convinced that no draining is required
> between iommu/device to avoid interference between activities
> from old/new address space.

Something like IDXD needs to halt DMAs on the PASID and flush all
outstanding DMA to get to a state where the PASID is quiet from the
device perspective. This is the only way to stop interference.

If the device is still issuing DMA after the domain changes then it is
never going to work right.

If *IDXD* needs some help to flush PRIs after it halts DMAs (because
it can't do it on its own for some reason) then IDXD should have an
explicit call to do that, after suspending new DMA.

We don't know what things devices will need to do here, devices that
are able to wait for PRIs to complete may want a cancelling flush to
speed that up, and that shouldn't be part of the translation change.

IOW the act of halting DMA and the act of changing the translation
really should be different things. Then we get into interesting
questions like what sequence is required for a successful FLR. :\

Jason