Re: [PATCH] iommu/riscv: Replace illegal command with dummy IOFENCE to prevent hardware lockup

From: Zong Li

Date: Thu Jun 25 2026 - 03:13:13 EST

On Wed, Jun 24, 2026 at 10:08 PM <fangyu.yu@xxxxxxxxxxxxxxxxx> wrote:
>
> >When the RISC-V IOMMU encounters an illegal command, the hardware
> >stops processing and the HEAD register remains pointing at the
> >illegal command. If software does not handle this properly, the
> >hardware will be stuck at this index indefinitely, preventing any
> >further command queue operations.
> >
> >This patch implements a recovery mechanism by replacing the illegal
> >command with a dummy IOFENCE instruction (all operands are zero):
> >
> >1. Prevents hardware lockup: By overwriting the illegal command with
> > a valid instruction, the hardware can continue processing from the
> > current position instead of being stuck.
> >
> >2. Enables user recovery: After replacing the illegal command, the
> > user/driver has an opportunity to retry the original failed
> > operation rather than losing all queued work.
> >
> >3. Minimal hardware impact: A dummy IOFENCE behaves as a NOP, it
> > it performs no cache invalidation operations and has no side
> > effects on the system state. This is the safest replacement
> > instruction.
> >
> >Signed-off-by: Zong Li <zong.li@xxxxxxxxxx>
> >---
> > drivers/iommu/riscv/iommu.c | 24 +++++++++++++++++++++++-
> > 1 file changed, 23 insertions(+), 1 deletion(-)
> >
> >diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
> >index cec3ddd7ab10..6305ec5f467b 100644
> >--- a/drivers/iommu/riscv/iommu.c
> >+++ b/drivers/iommu/riscv/iommu.c
> >@@ -464,13 +464,35 @@ static unsigned int riscv_iommu_queue_send(struct riscv_iommu_queue *queue,
> > static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data)
> > {
> > const struct riscv_iommu_queue *queue = (struct riscv_iommu_queue *)data;
> >- unsigned int ctrl;
> >+ struct riscv_iommu_command cmd;
> >+ unsigned int ctrl, head;
> >
> > /* Clear MF/CQ errors, complete error recovery to be implemented. */
> > ctrl = riscv_iommu_readl(queue->iommu, queue->qcr);
> > if (ctrl & (RISCV_IOMMU_CQCSR_CQMF | RISCV_IOMMU_CQCSR_CMD_TO |
> > RISCV_IOMMU_CQCSR_CMD_ILL | RISCV_IOMMU_CQCSR_FENCE_W_IP)) {
> >+ /*
> >+ * The head pointer is not updated by the hardware, it
> >+ * still points to the index of illegal command
> >+ */
> >+ riscv_iommu_readl_timeout(queue->iommu, Q_HEAD(queue), head,
> >+ !(head & ~queue->mask), 0,
> >+ RISCV_IOMMU_QUEUE_TIMEOUT);
> >+
> >+ if (ctrl & RISCV_IOMMU_CQCSR_CMD_ILL) {
> >+ /*
> >+ * Use a dummy IOFENCE instead of the illegal command
> >+ * to prevent hardware lockup
> >+ */
>
> The RISC-V IOMMU spec 1.0 (Section 5.15, cqcsr) states:
>
> "If software makes the CQ operational again after a cmd_ill or
> cqmf error, then software should resubmit the commands submitted
> since the last IOFENCE.C that successfully completed."
>
> So it seems that simply replacing the illegal command and letting the
> queue continue is not sufficient.
>

Yes, this patch doesn't completly follow what the spec mentioned.
Initially, I wanted to implement the method suggested by the spec.
However, there are some synchronization challenges based on current
driver implementation. Therefore, I chose an approach similar to other
IOMMU drivers (like SMMU and Intel VT-d). The main goal is to at least
prevent the hardware from getting stuck and crashing the entire
system.

Here are the main issues we would need to solve if we fully follow the spec:

1. IOFENCE index shift: After an illegal command, if another thread is
waiting for subsequent IOFENCE to finish, re-submitting commands will
change that IOFENCE's index in the queue. This means the waiting
thread in 'riscv_iommu_cmd_sync' might finish too early because the
'prod' value should be changed as well. We could fix this by making
IOFENCE write a sequence number to a specific address, and having the
thread wait for that data instead.

2. Timeout errors: If an illegal command happens while another thread
is trying to write a command, that thread might be waiting for the
tail to move (in 'riscv_iommu_queue_send'), and exit the wait due to a
timeout. This leads to errors in the caller subsystem (like DMA). So
it seems even if the resubmit finishes later, it might not help much.

3. Queue tail mismatch: Similar to point 2 situation, a thread waiting
in 'riscv_iommu_queue_send' expects prod == queue->tail. If we
re-submit commands quickly, queue->tail is updated asynchronously to a
farther value. The waiting thread might never see the condition met
and time out.

4. Shadow queue overhead: Inside the threading IRQ handler, we cannot
easily know what the illegal command was just by checking the current
command queue. We would need to create a "shadow command queue" to
keep a history. This would break the current driver design. We would
also need to add locks to prevent race conditions on shadow command
queue, which would reduce the driver's performance.

Considering these trade-offs, I prefer not to make the driver much
more complex and slower just to handle rare hardware errors.
Personally, I lean towards treating this hardware fault as a fatal
error without trying to recover it in software. However, I understand
this might be too extreme and would require a hardware reset.
Therefore, this patch might be a good middle ground. It prevents the
hardware lockup. Even though we might lose some commands and cause
incorrect results for the user, it at least keeps the system alive and
gives the user a chance to retry their operation again.

> Thanks,
> Fangyu
>
> >+ memset(&cmd, 0, sizeof(cmd));
> >+ cmd.dword0 = FIELD_PREP(RISCV_IOMMU_CMD0_OPCODE,
> >+ RISCV_IOMMU_CMD_IOFENCE_OPCODE);
> >+ memcpy(queue->base + head * sizeof(cmd), &cmd, sizeof(cmd));
> >+ dma_wmb();
> >+ }
> >+
> > riscv_iommu_writel(queue->iommu, queue->qcr, ctrl);
> >+
> > dev_warn(queue->iommu->dev,
> > "Queue #%u error; fault:%d timeout:%d illegal:%d fence_w_ip:%d\n",
> > queue->qid,
> >--
> >2.43.7