RE: [PATCH 3/3] iommu: Convert response code from failure to invalid

From: Tian, Kevin
Date: Wed Jul 10 2024 - 05:56:52 EST


> From: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
> Sent: Wednesday, July 10, 2024 4:34 PM
>
> In the iopf deliver path, iommu_report_device_fault() currently responds
> a code of "Response Failure" to the hardware if it can't find a suitable
> iopf-capable domain where the page faults should be handled. The Response
> Failure is defined in PCI spec (section 10.4.2.1) as a catastrophic error,
> and the device will disable PRI for the function.
>
> Failing to dispatch a set of fault messages is not a catastrophic error
> and the iommu core has no code to recover the PRI once it is disabled.
>
> Replace it with a response code of "Invalid Response".
>
> Fixes: b554e396e51c ("iommu: Make iopf_group_response() return void")
> Signed-off-by: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
> ---
> drivers/iommu/io-pgfault.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> index cd679c13752e..b8270ee5dcdb 100644
> --- a/drivers/iommu/io-pgfault.c
> +++ b/drivers/iommu/io-pgfault.c
> @@ -229,7 +229,7 @@ void iommu_report_device_fault(struct device *dev,
> struct iopf_fault *evt)
> err_abort:
> dev_warn_ratelimited(dev, "iopf with pasid %d aborted\n",
> fault->prm.pasid);
> - iopf_group_response(group, IOMMU_PAGE_RESP_FAILURE);
> + iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
> if (group == &abort_group)
> __iopf_free_group(group);
> else

this doesn't match the spec words on the use of INVALID:

One or more pages within the associated PRG do not exist or
requests access privilege(s) that cannot be granted. Unless the
page mapping associated with the Function is altered, re-issuance
of the associated request will never result in success.

this situation is not related to the permission of page mapping. Instead
it's a global problem applying to all functions submitting page requests
at this moment.

Though using FAILURE affects more functions sharing the PRI interface,
it also proactively avoids adding more in-fly requests to worsen the
low memory situation.

So none of the options looks perfect to me, but the existing code
responding FAILURE is more aligned to the spec?