Re: [PATCH v18 3/4] vfio/pci: Add a reset_done callback for vfio-pci driver
From: Farhan Ali
Date: Fri Jun 05 2026 - 14:43:38 EST
On 6/4/2026 1:42 PM, Keith Busch wrote:
On Thu, Jun 04, 2026 at 10:17:04AM -0700, Farhan Ali wrote:
On 6/4/2026 1:28 AM, Keith Busch wrote:But isn't this reset initiated by the kernel via the kernel's AER
On Wed, Jun 03, 2026 at 11:24:14AM -0700, Farhan Ali wrote:I think if the VFIO_DEVICE_RESET ioctl completes successfully it should be
+static void vfio_pci_core_aer_reset_done(struct pci_dev *pdev)Shouldn't there be a cooresponding user space notification that the
+{
+ struct vfio_pci_core_device *vdev = dev_get_drvdata(&pdev->dev);
+
+ if (!vdev->pci_saved_state)
+ return;
+
+ pci_load_saved_state(pdev, vdev->pci_saved_state);
+ pci_restore_state(pdev);
+}
device has been restored? There's an eventfd on the error detected side
so user space can know the device needs recovery, but how does it come
to know that the reset is completed?
an indication that the reset has completed?
handler? The user space driven ioctl has nothing to do with it, unless
I'm missing something. I'm just mentioning it as I was recently asked to
look into DPC and AER handling for vfio, and I think there needs to be
coordination with userspace here for a more reliable recovery.
The approach I have taken for s390x, is on an error for PCI devices bound to vfio, we bypass host recovery completely so the kernel doesn't drive the reset (see patch 1 of this series). The recovery will then have to be driven by userspace. The error_detected() callback and eventfd notifies userspace on an error, and then userspace can drive the recovery via VFIO_DEVICE_RESET. For our primary use case of QEMU, once notified it then injects this error into the VM so device drivers in the VM can take recovery actions. For example for a passthrough NVMe device, the VM's OS NVMe driver will access the device. At this point the VM's NVMe driver's error_detected() will drive the recovery by returning PCI_ERS_RESULT_NEED_RESET, and the s390x error recovery in the VM's OS will try to do a reset. Resets are privileged operations and so the VM will need intervention from QEMU to perform the reset. QEMU will invoke the VFIO_DEVICE_RESET ioctl to now notify the host that the VM is requesting a reset of the device. The vfio-pci driver on the host will then perform the reset on the device to recover it. This also aligns architecturally for us as on s390 as PCI devices are exposed as functions to the OS, so an OS can issue resets per function (with platform firmware doing the heavy lifting). But I am curious to learn about your thoughts for DPC/AER with vfio for other platforms (x86/ARM?). Thanks
Farhan