Re: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()

From: Jason Gunthorpe
Date: Sun Dec 03 2023 - 16:33:19 EST


On Sun, Dec 03, 2023 at 04:53:08PM +0800, Baolu Lu wrote:
> On 12/2/23 4:35 AM, Jason Gunthorpe wrote:
> > On Wed, Nov 15, 2023 at 11:02:26AM +0800, Lu Baolu wrote:
> > > The iopf_queue_flush_dev() is called by the iommu driver before releasing
> > > a PASID. It ensures that all pending faults for this PASID have been
> > > handled or cancelled, and won't hit the address space that reuses this
> > > PASID. The driver must make sure that no new fault is added to the queue.
> > This needs more explanation, why should anyone care?
> >
> > More importantly, why is*discarding* the right thing to do?
> > Especially why would we discard a partial page request group?
> >
> > After we change a translation we may have PRI requests in a
> > queue. They need to be acknowledged, not discarded. The DMA in the
> > device should be restarted and the device should observe the new
> > translation - if it is blocking then it should take a DMA error.
> >
> > More broadly, we should just let things run their normal course. The
> > domain to deliver the fault to should be determined very early. If we
> > get a fault and there is no fault domain currently assigned then just
> > restart it.
> >
> > The main reason to fence would be to allow the domain to become freed
> > as the faults should be holding pointers to it. But I feel there are
> > simpler options for that then this..
>
> In the iommu_detach_device_pasid() path, the domain is about to be
> removed from the pasid of device. The IOMMU driver performs the
> following steps sequentially:

I know that is why it does, but it doesn't explain at all why.

> 1. Clears the pasid translation entry. Thus, all subsequent DMA
> transactions (translation requests, translated requests or page
> requests) targeting the iommu domain will be blocked.
>
> 2. Waits until all pending page requests for the device's PASID have
> been reported to upper layers via the iommu_report_device_fault().
> However, this does not guarantee that all page requests have been
> responded.
>
> 3. Free all partial page requests for this pasid since the page request
> response is only needed for a complete request group. There's no
> action required for the page requests which are not last of a request
> group.

But we expect the last to come eventually since everything should be
grouped properly, so why bother doing this?

Indeed if 2 worked, how is this even possible to have partials?

> 5. Follow the IOMMU hardware requirements (for example, VT-d sepc,
> section 7.10, Software Steps to Drain Page Requests & Responses) to
> drain in-flight page requests and page group responses between the
> remapping hardware queues and the endpoint device.
>
> With above steps done in iommu_detach_device_pasid(), the pasid could be
> re-used for any other address space.

As I said, that isn't even required. There is no issue with leaking
PRI's across attachments.


> > I suppose the driver is expected to stop calling
> > iommu_report_device_fault() before calling this function, but that
> > doesn't seem like it is going to be possible. Drivers should be
> > implementing atomic replace for the PASID updates and in that case
> > there is no momement when it can say the HW will stop generating PRI.
>
> Atomic domain replacement for a PASID is not currently implemented in
> the core or driver.

It is, the driver should implement set_dev_pasid in such a way that
repeated calls do replacements, ideally atomically. This is what ARM
SMMUv3 does after my changes.

> Even if atomic replacement were to be implemented,
> it would be necessary to ensure that all translation requests,
> translated requests, page requests and responses for the old domain are
> drained before switching to the new domain.

Again, no it isn't required.

Requests simply have to continue to be acked, it doesn't matter if
they are acked against the wrong domain because the device will simply
re-issue them..

Jason