RE: [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present

From: Tian, Kevin
Date: Tue Jan 30 2024 - 04:25:28 EST


> From: Ethan Zhao <haifeng.zhao@xxxxxxxxxxxxxxx>
> Sent: Tuesday, January 30, 2024 5:13 PM
>
> On 1/30/2024 4:43 PM, Tian, Kevin wrote:
> >> From: Ethan Zhao <haifeng.zhao@xxxxxxxxxxxxxxx>
> >> Sent: Tuesday, January 30, 2024 4:16 PM
> >>
> >> On 1/30/2024 2:22 PM, Tian, Kevin wrote:
> >>> Here we need consider two situations.
> >>>
> >>> One is that the device is not bound to a driver or bound to a driver
> >>> which doesn't do active work to the device when it's removed. In
> >>> that case one may observe the timeout situation only in the removal
> >>> path as the stack dump in your patch02 shows.
> >> When iommu_bus_notifier() got called for hotplug removal cases to
> >> flush devTLB (ATS invalidation), driver was already unloaded.
> >> whatever safe removal or surprise removal. so in theory no active
> >> driver working there.
> >>
> >> pciehp_ist()
> >> pciehp_disable_slot()
> >> remove_board()
> >> pciehp_unconfigure_device()
> >> pci_stop_and_remove_bus_device()
> >> pci_stop_bus_device()--->here unload driver
> >> pci_remove_bus_device()->here qi_flush_dev_iotlb() got called.
> > yes, so patch02 can fix this case.
> >
> >>> patch02 can fix that case by checking whether the device is present
> >>> to skip sending the invalidation requests. So the logic being discussed
> >>> here doesn't matter.
> >>>
> >>> The 2nd situation is more tricky. The device might be bound to
> >>> a driver which is doing active work to the device with in-fly
> >>> ATS invalidation requests. In this case in-fly requests must be aborted
> >>> before the driver can be detached from the removed device.
> Conceptually
> >>> a device is removed from the bus only after its driver is detached.
> >> Some tricky situations:
> >>
> >> 1. The ATS invalidation request is issued from driver driver, while it is
> >> in handling, device is removed. this momment, the device instance still
> >> exists in the bus list. yes, if searching it by BDF, could get it.
> > it's searchable between the point where the device is removed and the
> > point where the driver is unloaded:
> >
> > CPU0 CPU1
> > (Driver is active) (pciehp handler)
> > qi_submit_sync() pciehp_ist()
> > ... ...
> > loop for completion() { pciehp_unconfigure_device()
> > ... pci_dev_set_disconnected()
> > if (ITE) { ...
> > //find pci_dev from sid pci_remove_bus_device()
> > if (pci_dev_is_connected()) device_del()
> > break; bus_remove_device()
> > } device_remove_driver()
>
> If the device was hot plugin or re-scanned, the device has a PCI_DEV_ADDED
> flag,

in this case is pci_dev_is_disconnected() true or false?

how is this patch supposed to work with it?

> if so the driver unloading work isn't defered to the tail of device_del(), it
> is unloaded before pci_remove_bus_device()->device_del(), in pci_stop_dev
>
> pci_stop_bus_device()
> pci_stop_dev()
> {
> if (pci_dev_is_added(dev)) {
> device_release_driver(&dev->dev);
> }

no matter where driver unload is requested, it needs to wait for aborting
in-fly request on CPU0.

>
> So the interval the device is searchable, only applied to those devices
> not hot plugged, or never be scanned.
>

and in the worst case even if pci_dev is not searchable, isn't it already
an indicator that the device is absent then qi_submit_sync() should
just exit upon ITE?