Re: [PATCH v3] iommu/vt-d: fix intel iommu iotlb sync hardlockup and retry

From: Baolu Lu

Date: Tue Mar 10 2026 - 02:47:07 EST


On 3/10/26 14:02, Baolu Lu wrote:
On 3/9/26 17:05, guanghuifeng@xxxxxxxxxxxxxxxxx wrote:
There are some concerns:

1. During the invalid request execution process, the IOMMU first fetches requests

     from the invalid queue to the internal cache.


2. If an ITE timeout occurs during the execution of a request fetched to the cache in step 1,

     the IOMMU driver clears the ITE status, allowing IOMMU to resume processing requests from the invalid queue.


3. For requests already fetched in step 1 that experience an ITE timeout, after the IOMMU driver clears the ITE,

     will IOMMU directly discard these timed-out/cached requests? or will it continue to execute these cached requests again?


Currently, the IOMMU driver implementation first clears ite to resume IOMMU execution

before setting desc_status to QI_ABORT.

If IOMMU will re-execute requests from the cache, then the IOMMU driver needs to be modified.

You are right.

The driver logic assumes that once an ITE error is cleared, the IOMMU
will not resume its previous execution but will instead fetch new
descriptors from the queue. This behavior was introduced by commit
6ba6c3a4cacfd ("VT-d: add device IOTLB invalidation support"), which has
been part of the driver since 2009.

By the way, section 6.5.2.11 of the spec states:

"
... At the time ITE field is Set, hardware aborts any inv_wait_dsc
commands pending in hardware and does not increment the Invalidation
Queue Head register. When software clears the ITE field in the Fault
Status Register, hardware fetches descriptor pointed by the Invalidation
Queue Head register. ...
"

This implies that the hardware must abort the previous execution and
restart by fetching new requests.

Thanks,
baolu