RE: [PATCH v11 13/16] iommu: Improve iopf_queue_remove_device()

From: Tian, Kevin
Date: Tue Feb 06 2024 - 03:09:42 EST


> From: Baolu Lu <baolu.lu@xxxxxxxxxxxxxxx>
> Sent: Monday, February 5, 2024 7:55 PM
>
> On 2024/2/5 17:00, Tian, Kevin wrote:
> >> From: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
> >> Sent: Tuesday, January 30, 2024 4:09 PM
> >> *
> >> - * Caller makes sure that no more faults are reported for this device.
> >> + * Removing a device from an iopf_queue. It's recommended to follow
> >> these
> >> + * steps when removing a device:
> >> *
> >> - * Return: 0 on success and <0 on error.
> >> + * - Disable new PRI reception: Turn off PRI generation in the IOMMU
> >> hardware
> >> + * and flush any hardware page request queues. This should be done
> >> before
> >> + * calling into this helper.
> >
> > this 1st step is already not followed by intel-iommu driver. The Page
> > Request Enable (PRE) bit is set in the context entry when a device
> > is attached to the default domain and cleared only in
> > intel_iommu_release_device().
> >
> > but iopf_queue_remove_device() is called when IOMMU_DEV_FEAT_IOPF
> > is disabled e.g. when idxd driver is unbound from the device.
> >
> > so the order is already violated.
> >
> >> + * - Acknowledge all outstanding PRQs to the device: Respond to all
> >> outstanding
> >> + * page requests with IOMMU_PAGE_RESP_INVALID, indicating the
> device
> >> should
> >> + * not retry. This helper function handles this.
> >> + * - Disable PRI on the device: After calling this helper, the caller could
> >> + * then disable PRI on the device.
> >
> > intel_iommu_disable_iopf() disables PRI cap before calling this helper.
>
> You are right. The individual drivers should be adjusted accordingly in
> separated patches. Here we just define the expected behaviors of the
> individual iommu driver from the core's perspective.

can you add a note in commit msg about it?

>
> >
> >> + * - Tear down the iopf infrastructure: Calling
> iopf_queue_remove_device()
> >> + * essentially disassociates the device. The fault_param might still exist,
> >> + * but iommu_page_response() will do nothing. The device fault
> parameter
> >> + * reference count has been properly passed from
> >> iommu_report_device_fault()
> >> + * to the fault handling work, and will eventually be released after
> >> + * iommu_page_response().
> >
> > it's unclear what 'tear down' means here.
>
> It's the same as calling iopf_queue_remove_device(). Perhaps I could
> remove the confusing "tear down the iopf infrastructure"?
>

I thought it is the last step then must have something real to do.

if not then removing it is clearer.