Re: [PATCH 2/3] vfio-pci: Fault mmaps to enable vma tracking

From: Jason Gunthorpe
Date: Mon May 04 2020 - 11:06:00 EST


On Mon, May 04, 2020 at 08:20:55AM -0600, Alex Williamson wrote:
> On Fri, 1 May 2020 20:25:50 -0300
> Jason Gunthorpe <jgg@xxxxxxxx> wrote:
>
> > On Fri, May 01, 2020 at 03:39:19PM -0600, Alex Williamson wrote:
> > > Rather than calling remap_pfn_range() when a region is mmap'd, setup
> > > a vm_ops handler to support dynamic faulting of the range on access.
> > > This allows us to manage a list of vmas actively mapping the area that
> > > we can later use to invalidate those mappings. The open callback
> > > invalidates the vma range so that all tracking is inserted in the
> > > fault handler and removed in the close handler.
> > >
> > > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > drivers/vfio/pci/vfio_pci.c | 76 ++++++++++++++++++++++++++++++++++-
> > > drivers/vfio/pci/vfio_pci_private.h | 7 +++
> > > 2 files changed, 81 insertions(+), 2 deletions(-)
> >
> > > +static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
> > > +{
> > > + struct vm_area_struct *vma = vmf->vma;
> > > + struct vfio_pci_device *vdev = vma->vm_private_data;
> > > +
> > > + if (vfio_pci_add_vma(vdev, vma))
> > > + return VM_FAULT_OOM;
> > > +
> > > + if (remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
> > > + vma->vm_end - vma->vm_start, vma->vm_page_prot))
> > > + return VM_FAULT_SIGBUS;
> > > +
> > > + return VM_FAULT_NOPAGE;
> > > +}
> > > +
> > > +static const struct vm_operations_struct vfio_pci_mmap_ops = {
> > > + .open = vfio_pci_mmap_open,
> > > + .close = vfio_pci_mmap_close,
> > > + .fault = vfio_pci_mmap_fault,
> > > +};
> > > +
> > > static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
> > > {
> > > struct vfio_pci_device *vdev = device_data;
> > > @@ -1357,8 +1421,14 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
> > > vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> > > vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
> > >
> > > - return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
> > > - req_len, vma->vm_page_prot);
> > > + /*
> > > + * See remap_pfn_range(), called from vfio_pci_fault() but we can't
> > > + * change vm_flags within the fault handler. Set them now.
> > > + */
> > > + vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
> > > + vma->vm_ops = &vfio_pci_mmap_ops;
> >
> > Perhaps do the vfio_pci_add_vma & remap_pfn_range combo here if the
> > BAR is activated ? That way a fully populated BAR is presented in the
> > common case and avoids taking a fault path?
> >
> > But it does seem OK as is
>
> Thanks for reviewing. There's also an argument that we defer
> remap_pfn_range() until the device is actually touched, which might
> reduce the startup latency.

But not startup to a functional VM as that will now have to take the
slower fault path.

> It's also a bit inconsistent with the vm_ops.open() path where I
> can't return error, so I can't call vfio_pci_add_vma(), I can only
> zap the vma so that the fault handler can return an error if
> necessary.

open could allocate memory so the zap isn't needed. If allocation
fails then do the zap and take the slow path.

> handler. If there's a good reason to do otherwise, I can make the
> change, but I doubt I'd have encountered the dma mapping of an
> unfaulted vma issue had I done it this way, so maybe there's a test
> coverage argument as well. Thanks,

This test is best done by having one thread disable the other bar
while another thread is trying to map it

Jason