RE: [PATCH v2 2/2] iommu/vt-d: Flush dev-IOTLB only when PCIe device is accessible in scalable mode

From: Tian, Kevin

Date: Tue Dec 23 2025 - 22:08:52 EST

+Bjorn for guidance.

quick context - previously intel-iommu driver fixed a lockup issue in surprise
removal, by checking pci_dev_is_disconnected(). But Jinhui still observed the
lockup issue in a setup where no interrupt is raised to pci core upon surprise
removal (so pci_dev_is_disconnected() is false), hence suggesting to replace
the check with pci_device_is_present() instead.

Bjorn, is it a common practice to fix it directly/only in drivers or should the
pci core be notified e.g. simulating a late removal event? By searching the
code looks it's the former, but better confirm with you before picking this
fix...

> From: Baolu Lu <baolu.lu@xxxxxxxxxxxxxxx>
> Sent: Tuesday, December 23, 2025 12:06 PM
>
> On 12/22/25 19:19, Jinhui Guo wrote:
> > On Thu, Dec 18, 2025 08:04:20AM +0000, Tian, Kevin wrote:
> >>> From: Jinhui Guo<guojinhui.liam@xxxxxxxxxxxxx>
> >>> Sent: Thursday, December 11, 2025 12:00 PM
> >>>
> >>> Commit 4fc82cd907ac ("iommu/vt-d: Don't issue ATS Invalidation
> >>> request when device is disconnected") relies on
> >>> pci_dev_is_disconnected() to skip ATS invalidation for
> >>> safely-removed devices, but it does not cover link-down caused
> >>> by faults, which can still hard-lock the system.
> >> According to the commit msg it actually tries to fix the hard lockup
> >> with surprise removal. For safe removal the device is not removed
> >> before invalidation is done:
> >>
> >> "
> >> For safe removal, device wouldn't be removed until the whole software
> >> handling process is done, it wouldn't trigger the hard lock up issue
> >> caused by too long ATS Invalidation timeout wait.
> >> "
> >>
> >> Can you help articulate the problem especially about the part
> >> 'link-down caused by faults"? What are those faults? How are
> >> they different from the said surprise removal in the commit
> >> msg to not set pci_dev_is_disconnected()?
> >>
> > Hi, kevin, sorry for the delayed reply.
> >
> > A normal or surprise removal of a PCIe device on a hot-plug port normally
> > triggers an interrupt from the PCIe switch.
> >
> > We have, however, observed cases where no interrupt is generated when
> the
> > device suddenly loses its link; the behaviour is identical to setting the
> > Link Disable bit in the switch’s Link Control register (offset 10h). Exactly
> > what goes wrong in the LTSSM between the PCIe switch and the endpoint
> remains
> > unknown.
>
> In this scenario, the hardware has effectively vanished, yet the device
> driver remains bound and the IOMMU resources haven't been released. I’m
> just curious if this stale state could trigger issues in other places
> before the kernel fully realizes the device is gone? I’m not objecting
> to the fix. I'm just interested in whether this 'zombie' state creates
> risks elsewhere.
>