[PATCH] iommu/riscv: Replace illegal command with dummy IOFENCE to prevent hardware lockup

From: Zong Li

Date: Tue Jun 23 2026 - 04:19:10 EST


When the RISC-V IOMMU encounters an illegal command, the hardware
stops processing and the HEAD register remains pointing at the
illegal command. If software does not handle this properly, the
hardware will be stuck at this index indefinitely, preventing any
further command queue operations.

This patch implements a recovery mechanism by replacing the illegal
command with a dummy IOFENCE instruction (all operands are zero):

1. Prevents hardware lockup: By overwriting the illegal command with
a valid instruction, the hardware can continue processing from the
current position instead of being stuck.

2. Enables user recovery: After replacing the illegal command, the
user/driver has an opportunity to retry the original failed
operation rather than losing all queued work.

3. Minimal hardware impact: A dummy IOFENCE behaves as a NOP, it
it performs no cache invalidation operations and has no side
effects on the system state. This is the safest replacement
instruction.

Signed-off-by: Zong Li <zong.li@xxxxxxxxxx>
---
drivers/iommu/riscv/iommu.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
index cec3ddd7ab10..6305ec5f467b 100644
--- a/drivers/iommu/riscv/iommu.c
+++ b/drivers/iommu/riscv/iommu.c
@@ -464,13 +464,35 @@ static unsigned int riscv_iommu_queue_send(struct riscv_iommu_queue *queue,
static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data)
{
const struct riscv_iommu_queue *queue = (struct riscv_iommu_queue *)data;
- unsigned int ctrl;
+ struct riscv_iommu_command cmd;
+ unsigned int ctrl, head;

/* Clear MF/CQ errors, complete error recovery to be implemented. */
ctrl = riscv_iommu_readl(queue->iommu, queue->qcr);
if (ctrl & (RISCV_IOMMU_CQCSR_CQMF | RISCV_IOMMU_CQCSR_CMD_TO |
RISCV_IOMMU_CQCSR_CMD_ILL | RISCV_IOMMU_CQCSR_FENCE_W_IP)) {
+ /*
+ * The head pointer is not updated by the hardware, it
+ * still points to the index of illegal command
+ */
+ riscv_iommu_readl_timeout(queue->iommu, Q_HEAD(queue), head,
+ !(head & ~queue->mask), 0,
+ RISCV_IOMMU_QUEUE_TIMEOUT);
+
+ if (ctrl & RISCV_IOMMU_CQCSR_CMD_ILL) {
+ /*
+ * Use a dummy IOFENCE instead of the illegal command
+ * to prevent hardware lockup
+ */
+ memset(&cmd, 0, sizeof(cmd));
+ cmd.dword0 = FIELD_PREP(RISCV_IOMMU_CMD0_OPCODE,
+ RISCV_IOMMU_CMD_IOFENCE_OPCODE);
+ memcpy(queue->base + head * sizeof(cmd), &cmd, sizeof(cmd));
+ dma_wmb();
+ }
+
riscv_iommu_writel(queue->iommu, queue->qcr, ctrl);
+
dev_warn(queue->iommu->dev,
"Queue #%u error; fault:%d timeout:%d illegal:%d fence_w_ip:%d\n",
queue->qid,
--
2.43.7