Re: [PATCH 6/9] vfio/pci: Clean up BAR zap and revocation

From: Leon Romanovsky

Date: Tue May 05 2026 - 07:24:33 EST


On Fri, May 01, 2026 at 05:19:19PM -0600, Alex Williamson wrote:
> On Thu, 16 Apr 2026 06:17:49 -0700
> Matt Evans <mattev@xxxxxxxx> wrote:
>
> > Previously, vfio_pci_zap_bars() (and the wrapper
> > vfio_pci_zap_and_down_write_memory_lock()) calls were paired with
> > calls of vfio_pci_dma_buf_move().
> >
> > This commit replaces them a unified new function,
> > vfio_pci_zap_revoke_bars() containing both the vfio_pci_dma_buf_move()
> > and the unmap_mapping_range(), making it harder for callers to omit
> > one. It adds a wrapper, vfio_pci_lock_zap_revoke_bars(), which takes
> > the write memory_lock before zapping, and adds a new
> > vfio_pci_unrevoke_bars() for the re-enable path.
> >
> > However, as of "vfio/pci: Convert BAR mmap() to use a DMABUF" the
> > unmap_mapping_range() to zap is entirely redundant for plain vfio-pci,
> > since the DMABUFs used for BAR mappings already zap PTEs when the
> > vfio_pci_dma_buf_move() occurs.
> >
> > One exception remains as a FIXME: in nvgrace-gpu, some BAR VMAs
> > conditionally use custom vm_ops, which have not moved to be backed by
> > DMABUFs. If these BARs are mmap()ed, the vdev enables the existing
> > behaviour of unmap_mapping_range() for the device fd address space.
>
> What's the plan here? Is this a temporary FIXME or a place to prove
> that dmabuf for mmap works beyond the core use case?
>
> >
> > Signed-off-by: Matt Evans <mattev@xxxxxxxx>
> > ---
> > drivers/vfio/pci/nvgrace-gpu/main.c | 5 +++
> > drivers/vfio/pci/vfio_pci_config.c | 30 ++++++--------
> > drivers/vfio/pci/vfio_pci_core.c | 62 +++++++++++++++++++----------
> > drivers/vfio/pci/vfio_pci_priv.h | 3 +-
> > include/linux/vfio_pci_core.h | 1 +
> > 5 files changed, 62 insertions(+), 39 deletions(-)
> >
> ...
> > @@ -1229,7 +1228,7 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
> > if (!vdev->reset_works)
> > return -EINVAL;
> >
> > - vfio_pci_zap_and_down_write_memory_lock(vdev);
> > + vfio_pci_lock_zap_revoke_bars(vdev);
> >
> > /*
> > * This function can be invoked while the power state is non-D0. If
> > @@ -1242,10 +1241,9 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
> > */
> > vfio_pci_set_power_state(vdev, PCI_D0);
> >
> > - vfio_pci_dma_buf_move(vdev, true);
>
> This seems subtle enough to be troublesome. I wonder if Leon didn't
> intentionally place the dmabuf revoke after the device is in D0 to
> allow the driver to interact with the device.

My intention was to place vfio_pci_dma_buf_move() as close as possible to
pci_try_reset_function(), so the device is known to be fully operational
at that point. It looks like calling it after the transition to D0 is the
right ordering.

Thanks

> I think the lock needs to come before the power state change to avoid racing a user induced
> state change. Thanks,
>
> Alex