On Thu, Feb 29, 2024 at 09:58:43AM +0800, Ethan Zhao wrote:
On 2/27/2024 7:05 AM, Bjorn Helgaas wrote:OK, I acked the patch.
On Thu, Feb 22, 2024 at 08:54:54PM +0800, Baolu Lu wrote:The logic of testing pci_dev_is_disconnected() in patch [2/3] works
On 2024/2/22 17:02, Ethan Zhao wrote:This wording is fundamentally wrong. Testing
Make pci_dev_is_disconnected() public so that it can be called fromHi PCI subsystem maintainers,
Intel VT-d driver to quickly fix/workaround the surprise removal
unplug hang issue for those ATS capable devices on PCIe switch downstream
hotplug capable ports.
Beside pci_device_is_present() function, this one has no config space
space access, so is light enough to optimize the normal pure surprise
removal and safe removal flow.
Tested-by: Haorong Ye<yehaorong@xxxxxxxxxxxxx>
Signed-off-by: Ethan Zhao<haifeng.zhao@xxxxxxxxxxxxxxx>
---
drivers/pci/pci.h | 5 -----
include/linux/pci.h | 5 +++++
2 files changed, 5 insertions(+), 5 deletions(-)
The iommu drivers (including, but not limited to, the Intel VT-d driver)
require a helper to check the physical presence of a PCI device in two
scenarios:
- During the iommu_release_device() path: This ensures the device is
physically present before sending device TLB invalidation to device.
pci_dev_is_disconnected() can never ensure the device will still be
present by the time a TLB invalidation is sent.
in the opposite:
1. if pci_dev_is_disconnected() return true, means the device is in
the process of surprise removal handling, adapter already been
removed from the slot.
2. for removed device, no need to send ATS invalidation request to it.
removed device lost power, its devTLB wouldn't be valid anymore.
3. if pci_dev_is_disconnected() return false, the device is *likely*
to be removed at any momoment after this function called.
such case will be treated in the iommu ITE fault handling, not to
retry the timeout request if device isn't present (patch [3/3]).
The device may be removed after the pci_dev_is_disconnected() test andeven in the process while TLB is invalidating.
before a TLB invalidate is sent.
This is why I hesitate to expose pci_dev_is_disconnected() (andYup, your concern is worthy ,but isn't happening within this patchset.
pci_device_is_present(), which we already export) outside
drivers/pci/. They both lead to terrible mistakes like relying on the
false assumption that the result will remain valid after the functions
return, without any recognition that we MUST be able to deal with the
cases where that assumption is broken.
I guess my complaint is really with pci_device_is_present() because
that's even harder to use correctly.
pci_device_is_present():
slow (may do config access to device)
true => device *was* present in the recent past, may not be now
false => device is not accessible
pci_dev_is_disconnected():
fast (doesn't touch device)
true => device is not accessible
false => basically means nothing
I guess they're both hard ;)
Bjorn