[PATCH v2] iommu/riscv: Replace illegal command with dummy IOFENCE to prevent hardware lockup

From: Zong Li

Date: Mon Jun 29 2026 - 22:31:40 EST

When the RISC-V IOMMU encounters an illegal command, the hardware
stops processing and the HEAD register remains pointing at the
illegal command. If software does not handle this properly, the
hardware will be stuck at this index indefinitely, preventing any
further command queue operations.

This patch implements a recovery mechanism by replacing the illegal
command with a dummy IOFENCE instruction (all operands are zero):

1. Prevents hardware lockup: By overwriting the illegal command with
a valid instruction, the hardware can continue processing from the
current position instead of being stuck.

2. Enables user recovery: After replacing the illegal command, the
user/driver has an opportunity to retry the original failed
operation rather than losing all queued work.

3. Minimal hardware impact: A dummy IOFENCE behaves as a NOP, it
it performs no cache invalidation operations and has no side
effects on the system state. This is the safest replacement
instruction.

Signed-off-by: Zong Li <zong.li@xxxxxxxxxx>
---

The main goal is to at least prevent the hardware from getting stuck
and crashing the entire system.

Here are the main issues we would need to solve if we fully follow the spec:

1. IOFENCE index shift: After an illegal command, if another thread is
waiting for subsequent IOFENCE to finish, resubmitting commands will
change that IOFENCE's index in the queue. This means the waiting
thread in 'riscv_iommu_cmd_sync' might finish too early because the
'prod' value should be changed as well. We could fix this by making
IOFENCE write a sequence number to a specific address, and having the
thread wait for that data instead.

2. Timeout errors: If an illegal command happens while another thread
is trying to write a command, that thread might be waiting for the
tail to move (in 'riscv_iommu_queue_send'), and exit the wait due to a
timeout. This leads to errors in the caller subsystem (like DMA). So
it seems even if the resubmit finishes later, it might not help much.

3. Queue tail mismatch: Similar to point 2 situation, a thread waiting
in 'riscv_iommu_queue_send' expects prod == queue->tail. If we
resubmit commands quickly, queue->tail is updated asynchronously to a
farther value. The waiting thread might never see the condition met
and time out.

4. Shadow queue overhead: Inside the threading IRQ handler, we cannot
easily know what the illegal command was just by checking the current
command queue. We would need to create a "shadow command queue" to
keep a history. This would break the current driver design. We would
also need to add locks to prevent race conditions on shadow command
queue, which would reduce the driver's performance.

Considering these trade-offs, I prefer not to make the driver much
more complex and slower just to handle rare hardware errors.
Treating this hardware fault as a fatal error without trying to
recover it in software might be too extreme and would require a
hardware reset. Therefore, this patch might be a good middle ground.
It prevents the hardware lockup. Even though we might lose some commands
and cause incorrect results for the user, it at least keeps the system
alive and gives the user a chance to retry their operation again.

Changed in v1:
- Added more comments
- Rebased on v7.2-rc1

drivers/iommu/riscv/iommu.c | 32 +++++++++++++++++++++++++++++++-
1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index cec3ddd7ab10..c009c1906e23 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -464,13 +464,43 @@ static unsigned int riscv_iommu_queue_send(struct riscv_iommu_queue *queue,
static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data)
{
const struct riscv_iommu_queue *queue = (struct riscv_iommu_queue *)data;
- unsigned int ctrl;
+ struct riscv_iommu_command cmd;
+ unsigned int ctrl, head;

/* Clear MF/CQ errors, complete error recovery to be implemented. */
ctrl = riscv_iommu_readl(queue->iommu, queue->qcr);
if (ctrl & (RISCV_IOMMU_CQCSR_CQMF | RISCV_IOMMU_CQCSR_CMD_TO |
RISCV_IOMMU_CQCSR_CMD_ILL | RISCV_IOMMU_CQCSR_FENCE_W_IP)) {
+
+ /*
+ * Resubmitting all commands submitted since the last IOFENCE that
+ * successfully completed will introduce various race conditions.
+ * Use a dummy IOFENCE instead of the illegal command to prevent
+ * hardware lockup.
+ * Please note that some commands might be lost, including:
+ * - The task from the illegal command itself
+ * - The commands submitted between the last IOFENCE and illegal one
+ * However, this gives the user or driver a chance to retry the
+ * failed operation without resetting the enitre system
+ */
+ if (ctrl & RISCV_IOMMU_CQCSR_CMD_ILL) {
+ /*
+ * The head pointer is not updated by the hardware, it
+ * still points to the index of illegal command
+ */
+ riscv_iommu_readl_timeout(queue->iommu, Q_HEAD(queue), head,
+ !(head & ~queue->mask), 0,
+ RISCV_IOMMU_QUEUE_TIMEOUT);
+
+ memset(&cmd, 0, sizeof(cmd));
+ cmd.dword0 = FIELD_PREP(RISCV_IOMMU_CMD0_OPCODE,
+ RISCV_IOMMU_CMD_IOFENCE_OPCODE);
+ memcpy(queue->base + head * sizeof(cmd), &cmd, sizeof(cmd));
+ dma_wmb();
+ }
+
riscv_iommu_writel(queue->iommu, queue->qcr, ctrl);
+
dev_warn(queue->iommu->dev,
"Queue #%u error; fault:%d timeout:%d illegal:%d fence_w_ip:%d\n",
queue->qid,
--
2.43.7