Re: [PATCH v3] iommu/vt-d: fix intel iommu iotlb sync hardlockup and retry

From: Baolu Lu

Date: Tue Mar 10 2026 - 02:03:41 EST


On 3/9/26 17:05, guanghuifeng@xxxxxxxxxxxxxxxxx wrote:
There are some concerns:

1. During the invalid request execution process, the IOMMU first fetches requests

    from the invalid queue to the internal cache.


2. If an ITE timeout occurs during the execution of a request fetched to the cache in step 1,

    the IOMMU driver clears the ITE status, allowing IOMMU to resume processing requests from the invalid queue.


3. For requests already fetched in step 1 that experience an ITE timeout, after the IOMMU driver clears the ITE,

    will IOMMU directly discard these timed-out/cached requests? or will it continue to execute these cached requests again?


Currently, the IOMMU driver implementation first clears ite to resume IOMMU execution

before setting desc_status to QI_ABORT.

If IOMMU will re-execute requests from the cache, then the IOMMU driver needs to be modified.

You are right.

The driver logic assumes that once an ITE error is cleared, the IOMMU
will not resume its previous execution but will instead fetch new
descriptors from the queue. This behavior was introduced by commit
6ba6c3a4cacfd ("VT-d: add device IOTLB invalidation support"), which has
been part of the driver since 2009.


It should first set desc_status to QI_ABORT, then execute writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG)

to resume IOMMU execution(In this case, some requests will be resubmitted and executed twice.).

Otherwise, iommu may write the QI_DONE result back to desc_status after execution, and the iommu driver will

simultaneously set desc_status to QI_ABORT, leading to data modification contention and timing issues.


Thanks.

Thanks,
baolu