[PATCH 8/8] selftests/vfio: igb: Recover after DMA-read faults
From: Alex Williamson
Date: Fri May 15 2026 - 18:06:21 EST
The mix_and_match test intentionally submits a TX descriptor with an
unmapped source IOVA so that the DMA read fails. On real 82576
hardware the resulting fault leaves the descriptor engine unable to
service subsequent valid descriptors, so the next memcpy in the same
test iteration times out.
The 82576 datasheet (section 4.2.1.6.1) describes CTRL.RST as the
software mechanism to recover from a hung device. Empirically
CTRL.RST alone is not sufficient in this state: the visible queue
registers are reinitialized, but the next valid memcpy still posts
descriptors without any TDH/TDT progress in the same process. A
fresh device open after the failure works, which points to a reset
scope broader than CTRL.RST being required. The 82576 advertises
PCIe FLR; VFIO_DEVICE_RESET drives FLR and supplies that scope while
preserving the selftest process and its DMA mappings.
Add igb_error_reset_and_reinit() implementing the recovery sequence:
issue VFIO_DEVICE_RESET, re-arm the kernel-side MSI-X trigger against
the still-valid eventfd via vfio_pci_irq_reenable() (this does not
touch the eventfd, which test fixtures may have cached), and
re-program the device via igb_hw_init(). FLR clears EICR and leaves
EIMS=0, so no explicit interrupt mask or cause writes are needed.
igb_hw_init() resets tx_tail/rx_tail to 0 and igb_memcpy_start() zeros
each descriptor before submission, so no ring memset is needed either.
Call this from igb_memcpy_wait() on completion timeout, preceded by a
10 ms delay so that PCIe/IOMMU/AER error handling triggered by the
just-observed DMA fault can release the device lock VFIO_DEVICE_RESET
contends for. The delay is heuristic and tied to the fault path, so
it lives at the call site rather than inside the reset helper. The
failed memcpy still returns -ETIMEDOUT; reset recovery only ensures
the next operation starts from a usable device state.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx>
---
.../selftests/vfio/lib/drivers/igb/igb.c | 40 +++++++++++++++++++
1 file changed, 40 insertions(+)
diff --git a/tools/testing/selftests/vfio/lib/drivers/igb/igb.c b/tools/testing/selftests/vfio/lib/drivers/igb/igb.c
index ef242ebd9d2e..07d1a907f18a 100644
--- a/tools/testing/selftests/vfio/lib/drivers/igb/igb.c
+++ b/tools/testing/selftests/vfio/lib/drivers/igb/igb.c
@@ -443,6 +443,28 @@ static void igb_memcpy_start(struct vfio_pci_device *device, iova_t src,
igb_write32(igb, IGB_TDT0, igb->tx_tail);
}
+/*
+ * Reset the device via VFIO_DEVICE_RESET (PCIe FLR on the 82576) and
+ * re-program it. VFIO_DEVICE_RESET tears down the kernel-side MSI-X
+ * trigger but leaves user-side eventfds intact, so re-arm the trigger
+ * via vfio_pci_irq_reenable() before reprogramming so any caller-cached
+ * eventfd remains valid.
+ *
+ * FLR clears device-side state to power-on reset values (datasheet
+ * 4.2.1.5.1: a PF FLR is "equivalent to a D0->D3->D0 transition"), so
+ * EIMS and EICR come back as 0 from their register-defined initial
+ * values, and igb_hw_init() resets tx_tail/rx_tail to 0. The next
+ * igb_memcpy_start() will memset each descriptor it touches before
+ * submission, so no explicit IMC/EICR writes or ring memsets are
+ * needed here.
+ */
+static void igb_error_reset_and_reinit(struct vfio_pci_device *device)
+{
+ vfio_pci_device_reset(device);
+ vfio_pci_irq_reenable(device, VFIO_PCI_MSIX_IRQ_INDEX, MSIX_VECTOR, 1);
+ igb_hw_init(device);
+}
+
static int igb_memcpy_wait(struct vfio_pci_device *device)
{
struct igb *igb = to_igb_state(device);
@@ -478,6 +500,24 @@ static int igb_memcpy_wait(struct vfio_pci_device *device)
if (rx->wb.status_error & 1)
return 0;
+ /*
+ * The descriptor never completed. On real 82576 hardware this
+ * typically follows a DMA-read fault from one of the intentional
+ * unmapped-IOVA tests; the fault leaves the descriptor engine
+ * unable to service subsequent valid descriptors. CTRL.RST alone
+ * reinitializes the queue registers but leaves the engine wedged
+ * for the current process, so a broader VFIO_DEVICE_RESET (FLR)
+ * is required.
+ *
+ * Delay before requesting reset so PCIe/IOMMU/AER error handling
+ * triggered by the just-observed DMA fault can release the device
+ * lock VFIO_DEVICE_RESET contends for. The 10 ms value is
+ * heuristic. The current memcpy still fails with -ETIMEDOUT;
+ * recovery only ensures the next memcpy starts from a usable state.
+ */
+ usleep(10000);
+ igb_error_reset_and_reinit(device);
+
return -ETIMEDOUT;
}
--
2.51.0