Re: [PATCH v1 6/6] vfio/nvgrace-gpu: vfio/nvgrace-gpu: wait for the GPU mem to be ready

From: Alex Williamson
Date: Mon Nov 17 2025 - 19:45:53 EST


On Mon, 17 Nov 2025 12:41:59 +0000
<ankita@xxxxxxxxxx> wrote:

> From: Ankit Agrawal <ankita@xxxxxxxxxx>
>
> Speculative prefetches from CPU to GPU memory until the GPU
> is not ready after reset can cause harmless corrected RAS events
> to be logged. It is thus expected that the mapping not be
> re-established until the GPU is ready post reset.
>
> Wait for the GPU to be ready on the first fault before establishing
> CPU mapping to the GPU memory. The GPU readiness can be checked
> through BAR0 registers as is already being done at the device probe.
>
> The state is checked on the first fault/huge_fault request using
> a flag. Unset the flag on every reset request.
>
> So intercept the following calls to the GPU reset, unset
> gpu_mem_mapped. Then use it to determine whether to wait before
> mapping.
> 1. VFIO_DEVICE_RESET ioctl call
> 2. FLR through config space.

If we need a stall after reset based on some device specific readiness
criteria, shouldn't we just implement a device specific reset? We can
create a reset callback that uses pcie_reset_flr() then pci_iomap()s
the BAR to poll the device. See for example delay_250ms_after_flr()
and nvme_disable_and_flr(). Thanks,

Alex