Re: [PATCH v11 05/22] vfio iommu: Added pin and unpin callback functions to vfio_iommu_driver_ops

From: Alex Williamson
Date: Tue Nov 08 2016 - 11:39:59 EST


On Tue, 8 Nov 2016 19:25:35 +0530
Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:

> On 11/8/2016 1:06 AM, Alex Williamson wrote:
> > On Sat, 5 Nov 2016 02:40:39 +0530
> > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
> >
> ...
> >> +int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
> >> + int npage, int prot, unsigned long *phys_pfn)
> >> +{
> >> + struct vfio_container *container;
> >> + struct vfio_group *group;
> >> + struct vfio_iommu_driver *driver;
> >> + int ret;
> >> +
> >> + if (!dev || !user_pfn || !phys_pfn)
> >> + return -EINVAL;
> >> +
> >> + group = vfio_group_get_from_dev(dev);
> >> + if (IS_ERR(group))
> >> + return PTR_ERR(group);
> >> +
> >> + ret = vfio_group_add_container_user(group);
> >> + if (ret)
> >> + goto err_pin_pages;
> >> +
> >> + container = group->container;
> >> + down_read(&container->group_lock);
> >> +
> >> + driver = container->iommu_driver;
> >> + if (likely(driver && driver->ops->pin_pages))
> >> + ret = driver->ops->pin_pages(container->iommu_data, user_pfn,
> >> + npage, prot, phys_pfn);
> >> + else
> >> + ret = -EINVAL;
> >
> > -ENOTTY might be a more appropriate error return here and below since
> > we're not signaling invalid argument, we're signaling lack of support.
> >
>
> Used -EINVAL in sync with other driver->ops like read, write and mmap.
> Changing it to -ENOTTY as you suggested above since these ops are optional.

TBH, I'm not sure which is better, but it's nice to try to
differentiate different error paths with different errno values.

> ...
>
> >> -static long vfio_pin_pages(unsigned long vaddr, long npage,
> >> - int prot, unsigned long *pfn_base)
> >> +static long __vfio_pin_pages_remote(unsigned long vaddr, long npage,
> >> + int prot, unsigned long *pfn_base)
> >
> > nit, what is the additional underscore prefix intended to imply?
> > Appending _remote is sufficient to avoid the symbol conflict.
> >
>
> This function name changed in review process from start, we started with
> changing to __vfio_pin_pages and then added _remote to it later. We can
> remove '__' from it. Updating.
>
> ...
>
> >> -
> >> + int (*pin_pages)(void *iommu_data, unsigned long *user_pfn,
> >> + int npage, int prot,
> >> + unsigned long *phys_pfn);
> >> + int (*unpin_pages)(void *iommu_data,
> >
> > Are we changing from long to int here simply because of the absurdity
> > in passing in more than a 2^31 entry array, that would already consume
> > more than 16GB itself?
> >
>
> These are on demand pin/unpin request, will that request go beyond 16GB
> limit? For Nvidia vGPU solution, pin request will not go beyond this limit.

16G is simply the size of the user_pfn or phys_pfn arrays at a maximal
int32_t npage value, the interface actually allows mapping up to 8TB
per call, but at that point we have 16GB of input, 16GB of output, and
80GB of vfio_pfns created. So I don't really have a problem changing
form long to int given lack of scalability in the API in general, but
it does make me second guess the API itself. Thanks,

Alex

> >> + unsigned long *user_pfn,
> >> + unsigned long *pfn,
> >
> > nit, use phys_pfn so as to match the pin function?
> >
>
> Ok.
>
> >> + int npage);
> >> };
> >>
> >> extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
> >> @@ -127,6 +133,12 @@ static inline long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
> >> }
> >> #endif /* CONFIG_EEH */
> >>
> >> +extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
> >> + int npage, int prot, unsigned long *phys_pfn);
> >> +
> >> +extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn,
> >> + unsigned long *pfn, int npage);
> >> +
> >> /*
> >> * IRQfd - generic
> >> */
> >
>
> Thanks,
> Kirti