Re: [PATCH] iommu/riscv: Replace illegal command with dummy IOFENCE to prevent hardware lockup

From: fangyu . yu

Date: Fri Jun 26 2026 - 09:50:19 EST

>On Wed, Jun 24, 2026 at 10:08 PM <fangyu.yu@xxxxxxxxxxxxxxxxx> wrote:
>>
>> >When the RISC-V IOMMU encounters an illegal command, the hardware
>> >stops processing and the HEAD register remains pointing at the
>> >illegal command. If software does not handle this properly, the
>> >hardware will be stuck at this index indefinitely, preventing any
>> >further command queue operations.
>> >
>> >This patch implements a recovery mechanism by replacing the illegal
>> >command with a dummy IOFENCE instruction (all operands are zero):
>> >
>> >1. Prevents hardware lockup: By overwriting the illegal command with
>> > a valid instruction, the hardware can continue processing from the
>> > current position instead of being stuck.
>> >
>> >2. Enables user recovery: After replacing the illegal command, the
>> > user/driver has an opportunity to retry the original failed
>> > operation rather than losing all queued work.
>> >
>> >3. Minimal hardware impact: A dummy IOFENCE behaves as a NOP, it
>> > it performs no cache invalidation operations and has no side
>> > effects on the system state. This is the safest replacement
>> > instruction.
>> >
>> >Signed-off-by: Zong Li <zong.li@xxxxxxxxxx>
>> >---
>> > drivers/iommu/riscv/iommu.c | 24 +++++++++++++++++++++++-
>> > 1 file changed, 23 insertions(+), 1 deletion(-)
>> >
>> >diff --git a/drivers/iommu/riscv/iommu.c b/drivers/iommu/riscv/iommu.c
>> >index cec3ddd7ab10..6305ec5f467b 100644
>> >--- a/drivers/iommu/riscv/iommu.c
>> >+++ b/drivers/iommu/riscv/iommu.c
>> >@@ -464,13 +464,35 @@ static unsigned int riscv_iommu_queue_send(struct riscv_iommu_queue *queue,
>> > static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data)
>> > {
>> > const struct riscv_iommu_queue *queue = (struct riscv_iommu_queue *)data;
>> >- unsigned int ctrl;
>> >+ struct riscv_iommu_command cmd;
>> >+ unsigned int ctrl, head;
>> >
>> > /* Clear MF/CQ errors, complete error recovery to be implemented. */
>> > ctrl = riscv_iommu_readl(queue->iommu, queue->qcr);
>> > if (ctrl & (RISCV_IOMMU_CQCSR_CQMF | RISCV_IOMMU_CQCSR_CMD_TO |
>> > RISCV_IOMMU_CQCSR_CMD_ILL | RISCV_IOMMU_CQCSR_FENCE_W_IP)) {
>> >+ /*
>> >+ * The head pointer is not updated by the hardware, it
>> >+ * still points to the index of illegal command
>> >+ */
>> >+ riscv_iommu_readl_timeout(queue->iommu, Q_HEAD(queue), head,
>> >+ !(head & ~queue->mask), 0,
>> >+ RISCV_IOMMU_QUEUE_TIMEOUT);
>> >+
>> >+ if (ctrl & RISCV_IOMMU_CQCSR_CMD_ILL) {
>> >+ /*
>> >+ * Use a dummy IOFENCE instead of the illegal command
>> >+ * to prevent hardware lockup
>> >+ */
>>
>> The RISC-V IOMMU spec 1.0 (Section 5.15, cqcsr) states:
>>
>> "If software makes the CQ operational again after a cmd_ill or
>> cqmf error, then software should resubmit the commands submitted
>> since the last IOFENCE.C that successfully completed."
>>
>> So it seems that simply replacing the illegal command and letting the
>> queue continue is not sufficient.
>>
>
>Yes, this patch doesn't completly follow what the spec mentioned.
>Initially, I wanted to implement the method suggested by the spec.
>However, there are some synchronization challenges based on current
>driver implementation. Therefore, I chose an approach similar to other
>IOMMU drivers (like SMMU and Intel VT-d). The main goal is to at least
>prevent the hardware from getting stuck and crashing the entire
>system.

Thanks for the detailed explanation. I understand the practical
challenges you described. However, I’d like to note that the ARM
SMMUv3 and Intel VT-d specs appear to provide somewhat different
guarantees from the RISC-V IOMMU spec with regard to error recovery.

ARM SMMUv3 (IHI 0070, Section 7.1):
"Older commands are unaffected by a later Command queue error."

Intel VT-d (Section 6.5.2.10):
"Any invalidation commands ahead of the invalid descriptor that
are already fetched and pending in hardware at the time of
detecting the invalid descriptor error are completed by
hardware as normal."

>
>Here are the main issues we would need to solve if we fully follow the spec:
>
>1. IOFENCE index shift: After an illegal command, if another thread is
>waiting for subsequent IOFENCE to finish, re-submitting commands will
>change that IOFENCE's index in the queue. This means the waiting
>thread in 'riscv_iommu_cmd_sync' might finish too early because the
>'prod' value should be changed as well. We could fix this by making
>IOFENCE write a sequence number to a specific address, and having the
>thread wait for that data instead.
>
>2. Timeout errors: If an illegal command happens while another thread
>is trying to write a command, that thread might be waiting for the
>tail to move (in 'riscv_iommu_queue_send'), and exit the wait due to a
>timeout. This leads to errors in the caller subsystem (like DMA). So
>it seems even if the resubmit finishes later, it might not help much.
>
>3. Queue tail mismatch: Similar to point 2 situation, a thread waiting
>in 'riscv_iommu_queue_send' expects prod == queue->tail. If we
>re-submit commands quickly, queue->tail is updated asynchronously to a
>farther value. The waiting thread might never see the condition met
>and time out.
>
>4. Shadow queue overhead: Inside the threading IRQ handler, we cannot
>easily know what the illegal command was just by checking the current
>command queue. We would need to create a "shadow command queue" to
>keep a history. This would break the current driver design. We would
>also need to add locks to prevent race conditions on shadow command
>queue, which would reduce the driver's performance.
>
>Considering these trade-offs, I prefer not to make the driver much
>more complex and slower just to handle rare hardware errors.
>Personally, I lean towards treating this hardware fault as a fatal
>error without trying to recover it in software. However, I understand
>this might be too extreme and would require a hardware reset.
>Therefore, this patch might be a good middle ground. It prevents the
>hardware lockup. Even though we might lose some commands and cause
>incorrect results for the user, it at least keeps the system alive and
>gives the user a chance to retry their operation again.
>

I understand the trade-off. Given the complexity of full spec
compliance, this simplified version may not actually address the
underlying issue very well, so I’d suggest making its limitations
explicit. Specifically, could you document in the commit message or
code comments which scenarios it can handle and which ones may
still remain problematic?

>
>> Thanks,
>> Fangyu
>>
>> >+ memset(&cmd, 0, sizeof(cmd));
>> >+ cmd.dword0 = FIELD_PREP(RISCV_IOMMU_CMD0_OPCODE,
>> >+ RISCV_IOMMU_CMD_IOFENCE_OPCODE);
>> >+ memcpy(queue->base + head * sizeof(cmd), &cmd, sizeof(cmd));
>> >+ dma_wmb();
>> >+ }
>> >+
>> > riscv_iommu_writel(queue->iommu, queue->qcr, ctrl);
>> >+
>> > dev_warn(queue->iommu->dev,
>> > "Queue #%u error; fault:%d timeout:%d illegal:%d fence_w_ip:%d\n",
>> > queue->qid,
>> >--
>> >2.43.7