Re: [PATCH 1/3] vfio: Introduce vma ops registration and notifier

From: Jason Gunthorpe
Date: Fri Feb 12 2021 - 16:22:41 EST


On Fri, Feb 12, 2021 at 12:27:39PM -0700, Alex Williamson wrote:
> Create an interface through vfio-core where a vfio bus driver (ex.
> vfio-pci) can register the vm_operations_struct it uses to map device
> memory, along with a set of registration callbacks. This allows
> vfio-core to expose interfaces for IOMMU backends to match a
> vm_area_struct to a bus driver and register a notifier for relavant
> changes to the device mapping. For now we define only a notifier
> action for closing the device.
>
> Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx>
> drivers/vfio/vfio.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/vfio.h | 20 ++++++++
> 2 files changed, 140 insertions(+)
>
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 38779e6fd80c..568f5e37a95f 100644
> +++ b/drivers/vfio/vfio.c
> @@ -47,6 +47,8 @@ static struct vfio {
> struct cdev group_cdev;
> dev_t group_devt;
> wait_queue_head_t release_q;
> + struct list_head vm_ops_list;
> + struct mutex vm_ops_lock;
> } vfio;
>
> struct vfio_iommu_driver {
> @@ -2354,6 +2356,121 @@ struct iommu_domain *vfio_group_iommu_domain(struct vfio_group *group)
> }
> EXPORT_SYMBOL_GPL(vfio_group_iommu_domain);
>
> +struct vfio_vma_ops {
> + const struct vm_operations_struct *vm_ops;
> + vfio_register_vma_nb_t *reg_fn;
> + vfio_unregister_vma_nb_t *unreg_fn;
> + struct list_head next;
> +};
> +
> +int vfio_register_vma_ops(const struct vm_operations_struct *vm_ops,
> + vfio_register_vma_nb_t *reg_fn,
> + vfio_unregister_vma_nb_t *unreg_fn)

This just feels a little bit too complicated

I've recently learned from Daniel that we can use the address_space
machinery to drive the zap_vma_ptes() via unmap_mapping_range(). This
technique replaces all the open, close and vma_list logic in vfio_pci

If we don't need open anymore, we could do something like this:

static const struct vm_operations_struct vfio_pci_mmap_ops = {
.open = vfio_pfn_open, // implemented in vfio.c
.close = vfio_pfn_close,
.fault = vfio_pci_mmap_fault,
};

Then we could code the function needed:

struct vfio_pfn_range_handle
{
struct kref kref;
struct vfio_device *vfio;
struct notifier_block invalidation_cb;
unsigned int flags;
}

struct vfio_pfn_range_handle *get_pfn_range(struct vm_area_struct *vma)
{
struct vfio_pfn_range_handle *handle;

if (vma->ops->open != vfio_pfn_open)
return NULL;

handle = vma->vm_private_data;
if (test_bit(handle->flags, DMA_STOPPED)
return NULL;
kref_get(&handle->kref);
return handle;
}

Where the common open/close only kref inc/dec the kref and all 'vfio'
VMAs always have a pointer to the same vfio_pfn_range_handle in their
private_data.

The vm_pgoff is already pointing at the physical pfn, so every part of
the system can get the information it needs fairly trivially.

Some stop access function is pretty simple looking

void stop_access(struct vfio_pfn_range_handle *handle)
{
set_bit(handle->flags, DMA_STOPPED);
unmap_mapping_range(handle->vfio->[..]->inode, 0, max, false);
srcu_notifier_call_chain(handle->invalidation_cb, VFIO_VMA_NOTIFY_CLOSE, NULL);
}

(well, have to sort out the locking some more, but that is the
general idea)

I think that would remove alot of the code added here and acts a lot
closer to how a someday dmabuf could act.

Also, this will need to update the nvlink vmops as well

Jason