Re: [PATCH v4 14/22] iommu: handle page response timeout

From: Jacob Pan
Date: Mon Apr 30 2018 - 13:51:52 EST


On Mon, 30 Apr 2018 11:58:10 +0100
Jean-Philippe Brucker <jean-philippe.brucker@xxxxxxx> wrote:

> On 25/04/18 16:37, Jacob Pan wrote:
> >> In the other cases (unsupported PRI or rogue guest) then disabling
> >> PRI using a FAILURE status might be the right thing to do. However,
> >> assuming the device follows the PCI spec it will stop sending page
> >> requests once there are as many PPRs in flight as the allocated
> >> credit.
> >>
> > Agreed, here I am not taking any actions. There may be need to drain
> > in-fly requests.
>
> Right, as long as we first ensure that no new fault is generated (by
> using a Response Failure). Though in my opinion not taking action
> might be the safest option :)
>
> Another thought: currently the comment in iommu.h says
> "@IOMMU_FAULT_STATUS_FAILURE: General error. Drop all subsequent
> faults from this device if possible. This is "Response Failure" in
> PCI PRI."
>
> I wonder if we should simply say "Drop all subsequent faults from the
> device". Even if the PCI device doesn't properly implement PRI, the
> IOMMU driver should set a "PRI disabled" bit in the device data that
> prevents it from from reporting new faults and flooding the queue.
> Anyway, it's a small detail that could go in a future patch series.
>
right, we should disable PRI and let future PRQ response pending on
re-enabling of PRI on the device. I will leave that to future
enhancement.
> >> If there isn't any possibility of memory leak or abusing
> >> resources, I don't think it's our problem that the guest is
> >> excessively slow at handling page requests. Setting an upper bound
> >> to page request latency might do more harm than good. Ensuring
> >> that devices respect the number of allocated in-flight PPRs is
> >> more important in my opinion.
> > How about we have a really long timeout, e.g. 1 min similar to
> > device invalidate response timeout in ATS spec., just for basic
> > safety and diagnosis. Optionally, we could have quota in parallel.
>
> I agree that for development a timeout is useful. It might be worth
> adding it as an option to the IOMMU module instead of a define.
> Perhaps a number of seconds, 10 being the default and 0 disabling the
> timeout? Otherwise we would probably end up with a succession of
> patches incrementing the timeout by arbitrary values, if people find
> it inconvenient.
>
make sense. will do.