Re: [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes

From: Raj, Ashok
Date: Thu Oct 15 2020 - 14:22:16 EST


Hi Jean

+ Baolu who is looking into this.


On Thu, Oct 15, 2020 at 11:00:27AM +0200, Jean-Philippe Brucker wrote:
> Add a parameter to iommu_sva_unbind_device() that tells the IOMMU driver
> whether the PRI queue needs flushing. When looking at the PCIe spec
> again I noticed that most of the time the SMMUv3 driver doesn't actually
> need to flush the PRI queue. Does this make sense for Intel VT-d as well
> or did I overlook something?
>
> Before calling iommu_sva_unbind_device(), device drivers must stop the
> device from using the PASID. For PCIe devices, that consists of
> completing any pending DMA, and completing any pending page request
> unless the device uses Stop Markers. So unless the device uses Stop
> Markers, we don't need to flush the PRI queue. For SMMUv3, stopping DMA
> means completing all stall events, so we never need to flush the event
> queue.

I don't think this is true. Baolu is working on an enhancement to this,
I'll quickly summarize this below:

Stop markers are weird, I'm not certain there is any device today that
sends STOP markers. Even if they did, markers don't have a required
response, they are fire and forget from the device pov.

I'm not sure about other IOMMU's how they behave, When there is no space in
the PRQ, IOMMU auto-responds to the device. This puts the device in a
while (1) loop. The fake successful response will let the device do a ATS
lookup, and that would fail forcing the device to do another PRQ. The idea
is somewhere there the OS has repeated the others and this will find a way
in the PRQ. The point is this is less reliable and can't be the only
indication. PRQ draining has a specific sequence.

The detailed steps are outlined in
"Chapter 7.10 "Software Steps to Drain Page Requests & Responses"

- Submit invalidation wait with fence flag to ensure all prior
invalidations are processed.
- submit iotlb followed by devtlb invalidation
- Submit invalidation wait with page-drain to make sure any page-requests
issued by the device are flushed when this invalidation wait completes.
- If during the above process there was a queue overflow SW can assume no
outstanding page-requests are there. If we had a queue full condition,
then sw must repeat step 2,3 above.


To that extent the proposal is as follows: names are suggestive :-) I'm
making this up as I go!

- iommu_stop_page_req() - Kernel needs to make sure we respond to all
outstanding requests (since we can't drop responses)
- Ensure we respond immediatly for anything that comes before the drain
sequence completes
- iommu_drain_page_req() - Which does the above invalidation with
Page_drain set.

Once the driver has performed a reset and before issuing any new request,
it does iommu_resume_page_req() this will ensure we start processing
incoming page-req after this point.


>
> First patch adds flags to unbind(), and the second one lets device
> drivers tell whether the PRI queue needs to be flushed.
>
> Other remarks:
>
> * The PCIe spec (see quote on patch 2), says that the device signals
> whether it has sent a Stop Marker or not during Stop PASID. In reality
> it's unlikely that a given device will sometimes use one stop method
> and sometimes the other, so it could be a device-wide flag rather than
> passing it at each unbind(). I don't want to speculate too much about
> future implementation so I prefer having the flag in unbind().
>
> * In patch 1, uacce passes 0 to unbind(). To pass the right flag I'm
> thinking that uacce->ops->stop_queue(), which tells the device driver
> to stop DMA, should return whether faults are pending. This can be
> added later once uacce has an actual PCIe user, but we need to
> remember to do it.

I think intel iommmu driver does this today for SVA when pasid is being
freed. Its still important to go through the drain before that PASID can be
re-purposed.

Cheers,
Ashok