Re: [PATCH v11 13/16] iommu: Improve iopf_queue_remove_device()

From: Baolu Lu
Date: Mon Feb 05 2024 - 06:55:41 EST


On 2024/2/5 17:00, Tian, Kevin wrote:
From: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
Sent: Tuesday, January 30, 2024 4:09 PM
*
- * Caller makes sure that no more faults are reported for this device.
+ * Removing a device from an iopf_queue. It's recommended to follow
these
+ * steps when removing a device:
*
- * Return: 0 on success and <0 on error.
+ * - Disable new PRI reception: Turn off PRI generation in the IOMMU
hardware
+ * and flush any hardware page request queues. This should be done
before
+ * calling into this helper.

this 1st step is already not followed by intel-iommu driver. The Page
Request Enable (PRE) bit is set in the context entry when a device
is attached to the default domain and cleared only in
intel_iommu_release_device().

but iopf_queue_remove_device() is called when IOMMU_DEV_FEAT_IOPF
is disabled e.g. when idxd driver is unbound from the device.

so the order is already violated.

+ * - Acknowledge all outstanding PRQs to the device: Respond to all
outstanding
+ * page requests with IOMMU_PAGE_RESP_INVALID, indicating the device
should
+ * not retry. This helper function handles this.
+ * - Disable PRI on the device: After calling this helper, the caller could
+ * then disable PRI on the device.

intel_iommu_disable_iopf() disables PRI cap before calling this helper.

You are right. The individual drivers should be adjusted accordingly in
separated patches. Here we just define the expected behaviors of the
individual iommu driver from the core's perspective.


+ * - Tear down the iopf infrastructure: Calling iopf_queue_remove_device()
+ * essentially disassociates the device. The fault_param might still exist,
+ * but iommu_page_response() will do nothing. The device fault parameter
+ * reference count has been properly passed from
iommu_report_device_fault()
+ * to the fault handling work, and will eventually be released after
+ * iommu_page_response().

it's unclear what 'tear down' means here.

It's the same as calling iopf_queue_remove_device(). Perhaps I could
remove the confusing "tear down the iopf infrastructure"?

Best regards,
baolu