Re: [PATCH v3] iommu/vt-d: fix intel iommu iotlb sync hardlockup and retry
From: guanghuifeng@xxxxxxxxxxxxxxxxx
Date: Mon Mar 09 2026 - 05:05:54 EST
There are some concerns:
1. During the invalid request execution process, the IOMMU first fetches requests
from the invalid queue to the internal cache.
2. If an ITE timeout occurs during the execution of a request fetched to the cache in step 1,
the IOMMU driver clears the ITE status, allowing IOMMU to resume processing requests from the invalid queue.
3. For requests already fetched in step 1 that experience an ITE timeout, after the IOMMU driver clears the ITE,
will IOMMU directly discard these timed-out/cached requests? or will it continue to execute these cached requests again?
Currently, the IOMMU driver implementation first clears ite to resume IOMMU execution
before setting desc_status to QI_ABORT.
If IOMMU will re-execute requests from the cache, then the IOMMU driver needs to be modified.
It should first set desc_status to QI_ABORT, then execute writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG)
to resume IOMMU execution(In this case, some requests will be resubmitted and executed twice.).
Otherwise, iommu may write the QI_DONE result back to desc_status after execution, and the iommu driver will
simultaneously set desc_status to QI_ABORT, leading to data modification contention and timing issues.
Thanks.
在 2026/3/6 18:15, Guanghui Feng 写道:
During the qi_check_fault process after an IOMMU ITE event,
requests at odd-numbered positions in the queue are set to
QI_ABORT, only satisfying single-request submissions. However,
qi_submit_sync now supports multiple simultaneous submissions,
and can't guarantee that the wait_desc will be at an odd-numbered
position. Therefore, if an item times out, IOMMU can't re-initiate
the request, resulting in an infinite polling wait.
This patch modifies the process by setting the status of all requests
already fetched by IOMMU and recorded as QI_IN_USE status (including
wait_desc requests) to QI_ABORT, thus enabling multiple requests to
be resubmitted.
Signed-off-by: Guanghui Feng <guanghuifeng@xxxxxxxxxxxxxxxxx>
Reviewed-by: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx>
---
drivers/iommu/intel/dmar.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index d68c06025cac..69222dbd2af0 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1314,7 +1314,6 @@ static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index)
if (fault & DMA_FSTS_ITE) {
head = readl(iommu->reg + DMAR_IQH_REG);
head = ((head >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
- head |= 1;
tail = readl(iommu->reg + DMAR_IQT_REG);
tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
@@ -1331,7 +1330,7 @@ static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index)
do {
if (qi->desc_status[head] == QI_IN_USE)
qi->desc_status[head] = QI_ABORT;
- head = (head - 2 + QI_LENGTH) % QI_LENGTH;
+ head = (head - 1 + QI_LENGTH) % QI_LENGTH;
} while (head != tail);
/*