RE: [PATCH v1 1/8] vfio: Add VFIO_IOMMU_PASID_REQUEST(alloc/free)
From: Tian, Kevin
Date: Wed Apr 01 2020 - 01:43:19 EST
> From: Liu, Yi L <yi.l.liu@xxxxxxxxx>
> Sent: Tuesday, March 31, 2020 9:22 PM
>
> > From: Tian, Kevin <kevin.tian@xxxxxxxxx>
> > Sent: Tuesday, March 31, 2020 1:41 PM
> > To: Liu, Yi L <yi.l.liu@xxxxxxxxx>; alex.williamson@xxxxxxxxxx;
> > eric.auger@xxxxxxxxxx
> > Subject: RE: [PATCH v1 1/8] vfio: Add
> VFIO_IOMMU_PASID_REQUEST(alloc/free)
> >
> > > From: Liu, Yi L <yi.l.liu@xxxxxxxxx>
> > > Sent: Monday, March 30, 2020 10:37 PM
> > >
> > > > From: Tian, Kevin <kevin.tian@xxxxxxxxx>
> > > > Sent: Monday, March 30, 2020 4:32 PM
> > > > To: Liu, Yi L <yi.l.liu@xxxxxxxxx>; alex.williamson@xxxxxxxxxx;
> > > > Subject: RE: [PATCH v1 1/8] vfio: Add
> > > VFIO_IOMMU_PASID_REQUEST(alloc/free)
> > > >
> > > > > From: Liu, Yi L <yi.l.liu@xxxxxxxxx>
> > > > > Sent: Sunday, March 22, 2020 8:32 PM
> > > > >
> > > > > From: Liu Yi L <yi.l.liu@xxxxxxxxx>
> > > > >
> > > > > For a long time, devices have only one DMA address space from
> > > > > platform IOMMU's point of view. This is true for both bare metal
> > > > > and directed- access in virtualization environment. Reason is the
> > > > > source ID of DMA in PCIe are BDF (bus/dev/fnc ID), which results
> > > > > in only device granularity
> > > >
> > > > are->is
> > >
> > > thanks.
> > >
> > > > > DMA isolation. However, this is changing with the latest
> > > > > advancement in I/O technology area. More and more platform
> vendors
> > > > > are utilizing the
> > > PCIe
> > > > > PASID TLP prefix in DMA requests, thus to give devices with
> > > > > multiple DMA address spaces as identified by their individual
> > > > > PASIDs. For example, Shared Virtual Addressing (SVA, a.k.a Shared
> > > > > Virtual Memory) is able to let device access multiple process
> > > > > virtual address space by binding the
> > > >
> > > > "address space" -> "address spaces"
> > > >
> > > > "binding the" -> "binding each"
> > >
> > > will correct both.
> > >
> > > > > virtual address space with a PASID. Wherein the PASID is allocated
> > > > > in software and programmed to device per device specific manner.
> > > > > Devices which support PASID capability are called PASID-capable
> > > > > devices. If such devices are passed through to VMs, guest software
> > > > > are also able to bind guest process virtual address space on such
> > > > > devices. Therefore, the guest software could reuse the bare metal
> > > > > software programming model,
> > > which
> > > > > means guest software will also allocate PASID and program it to
> > > > > device directly. This is a dangerous situation since it has
> > > > > potential PASID conflicts and unauthorized address space access.
> > > > > It would be safer to let host intercept in the guest software's
> > > > > PASID allocation. Thus PASID are managed system-wide.
> > > > >
> > > > > This patch adds VFIO_IOMMU_PASID_REQUEST ioctl which aims to
> > > > > passdown PASID allocation/free request from the virtual IOMMU.
> > > > > Additionally, such
> > > >
> > > > "Additionally, because such"
> > > >
> > > > > requests are intended to be invoked by QEMU or other applications
> > > which
> > > >
> > > > simplify to "intended to be invoked from userspace"
> > >
> > > got it.
> > >
> > > > > are running in userspace, it is necessary to have a mechanism to
> > > > > prevent single application from abusing available PASIDs in
> > > > > system. With such consideration, this patch tracks the VFIO PASID
> > > > > allocation per-VM. There was a discussion to make quota to be per
> > > > > assigned devices. e.g. if a VM has many assigned devices, then it
> > > > > should have more quota. However, it is not sure how many PASIDs an
> > > > > assigned devices will use. e.g. it is
> > > >
> > > > devices -> device
> > >
> > > got it.
> > >
> > > > > possible that a VM with multiples assigned devices but requests
> > > > > less PASIDs. Therefore per-VM quota would be better.
> > > > >
> > > > > This patch uses struct mm pointer as a per-VM token. We also
> > > > > considered using task structure pointer and vfio_iommu structure
> > > > > pointer. However, task structure is per-thread, which means it
> > > > > cannot achieve per-VM PASID alloc tracking purpose. While for
> > > > > vfio_iommu structure, it is visible only within vfio. Therefore,
> > > > > structure mm pointer is selected. This patch adds a structure
> > > > > vfio_mm. A vfio_mm is created when the first vfio container is
> > > > > opened by a VM. On the reverse order, vfio_mm is free when the
> > > > > last vfio container is released. Each VM is assigned with a PASID
> > > > > quota, so that it is not able to request PASID beyond its quota.
> > > > > This patch adds a default quota of 1000. This quota could be tuned
> > > > > by administrator. Making PASID quota tunable will be added in
> > > > > another
> > > patch
> > > > > in this series.
> > > > >
> > > > > Previous discussions:
> > > > > https://patchwork.kernel.org/patch/11209429/
> > > > >
> > > > > Cc: Kevin Tian <kevin.tian@xxxxxxxxx>
> > > > > CC: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> > > > > Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > > > Cc: Eric Auger <eric.auger@xxxxxxxxxx>
> > > > > Cc: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
> > > > > Signed-off-by: Liu Yi L <yi.l.liu@xxxxxxxxx>
> > > > > Signed-off-by: Yi Sun <yi.y.sun@xxxxxxxxxxxxxxx>
> > > > > Signed-off-by: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> > > > > ---
> > > > > drivers/vfio/vfio.c | 130
> > > > > ++++++++++++++++++++++++++++++++++++++++
> > > > > drivers/vfio/vfio_iommu_type1.c | 104
> > > > > ++++++++++++++++++++++++++++++++
> > > > > include/linux/vfio.h | 20 +++++++
> > > > > include/uapi/linux/vfio.h | 41 +++++++++++++
> > > > > 4 files changed, 295 insertions(+)
> > > > >
> > > > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index
> > > > > c848262..d13b483 100644
> > > > > --- a/drivers/vfio/vfio.c
> > > > > +++ b/drivers/vfio/vfio.c
> > > > > @@ -32,6 +32,7 @@
> > > > > #include <linux/vfio.h>
> > > > > #include <linux/wait.h>
> > > > > #include <linux/sched/signal.h>
> > > > > +#include <linux/sched/mm.h>
> > > > >
> > > > > #define DRIVER_VERSION "0.3"
> > > > > #define DRIVER_AUTHOR "Alex Williamson
> > > > > <alex.williamson@xxxxxxxxxx>"
> > > > > @@ -46,6 +47,8 @@ static struct vfio {
> > > > > struct mutex group_lock;
> > > > > struct cdev group_cdev;
> > > > > dev_t group_devt;
> > > > > + struct list_head vfio_mm_list;
> > > > > + struct mutex vfio_mm_lock;
> > > > > wait_queue_head_t release_q;
> > > > > } vfio;
> > > > >
> > > > > @@ -2129,6 +2132,131 @@ int vfio_unregister_notifier(struct device
> > > *dev,
> > > > > enum vfio_notify_type type,
> > > > > EXPORT_SYMBOL(vfio_unregister_notifier);
> > > > >
> > > > > /**
> > > > > + * VFIO_MM objects - create, release, get, put, search
> > > >
> > > > why capitalizing vfio_mm?
> > >
> > > oops, it's not intended, will fix it.
> > >
> > > > > + * Caller of the function should have held vfio.vfio_mm_lock.
> > > > > + */
> > > > > +static struct vfio_mm *vfio_create_mm(struct mm_struct *mm) {
> > > > > + struct vfio_mm *vmm;
> > > > > + struct vfio_mm_token *token;
> > > > > + int ret = 0;
> > > > > +
> > > > > + vmm = kzalloc(sizeof(*vmm), GFP_KERNEL);
> > > > > + if (!vmm)
> > > > > + return ERR_PTR(-ENOMEM);
> > > > > +
> > > > > + /* Per mm IOASID set used for quota control and group
> operations
> > > > > */
> > > > > + ret = ioasid_alloc_set((struct ioasid_set *) mm,
> > > > > + VFIO_DEFAULT_PASID_QUOTA, &vmm-
> > > > > >ioasid_sid);
> > > > > + if (ret) {
> > > > > + kfree(vmm);
> > > > > + return ERR_PTR(ret);
> > > > > + }
> > > > > +
> > > > > + kref_init(&vmm->kref);
> > > > > + token = &vmm->token;
> > > > > + token->val = mm;
> > > > > + vmm->pasid_quota = VFIO_DEFAULT_PASID_QUOTA;
> > > > > + mutex_init(&vmm->pasid_lock);
> > > > > +
> > > > > + list_add(&vmm->vfio_next, &vfio.vfio_mm_list);
> > > > > +
> > > > > + return vmm;
> > > > > +}
> > > > > +
> > > > > +static void vfio_mm_unlock_and_free(struct vfio_mm *vmm) {
> > > > > + /* destroy the ioasid set */
> > > > > + ioasid_free_set(vmm->ioasid_sid, true);
> > > >
> > > > do we need hold pasid lock here, since it attempts to destroy a set
> > > > which might be referenced by vfio_mm_pasid_free? or is there
> > > > guarantee that such race won't happen?
> > >
> > > Emmm, if considering the race between ioasid_free_set and
> > > vfio_mm_pasid_free, I guess ioasid core should sequence the two
> > > operations with its internal lock. right?
> >
> > I looked at below code in free path:
> >
> > + mutex_lock(&vmm->pasid_lock);
> > + pdata = ioasid_find(vmm->ioasid_sid, pasid, NULL);
> > + if (IS_ERR(pdata)) {
> > + ret = PTR_ERR(pdata);
> > + goto out_unlock;
> > + }
> > + ioasid_free(pasid);
> > +
> > +out_unlock:
> > + mutex_unlock(&vmm->pasid_lock);
> >
> > above implies that ioasid_find/free must be paired within the pasid_lock.
> > Then if we don't hold pasid_lock above, ioasid_free_set could happen
> between
> > find/free. I'm not sure whether this race would lead to real problem, but it
> doesn't
> > look correct simply by looking at this file.
>
> Well, Jacob told me to remove the ioasid_find in another email as he
> believes ioasid core should be able to take care of it. and also need to
> be protected by lock. If so, does it look good? :-)
>
> " [Jacob Pan] this might be better to put under ioasid code such that it
> is under the ioasid lock. no one else can free the ioasid between find() and
> free().
> e.g. ioasid_free(sid, pasid)
> if sid == INVALID_IOASID_SET, then no set ownership checking.
> thoughts?"
yes, that way looks better.
>
> > >
> > > > > + mutex_unlock(&vfio.vfio_mm_lock);
> > > > > + kfree(vmm);
> > > > > +}
> > > > > +
> > > > > +/* called with vfio.vfio_mm_lock held */ static void
> > > > > +vfio_mm_release(struct kref *kref) {
> > > > > + struct vfio_mm *vmm = container_of(kref, struct vfio_mm,
> kref);
> > > > > +
> > > > > + list_del(&vmm->vfio_next);
> > > > > + vfio_mm_unlock_and_free(vmm);
> > > > > +}
> > > > > +
> > > > > +void vfio_mm_put(struct vfio_mm *vmm) {
> > > > > + kref_put_mutex(&vmm->kref, vfio_mm_release,
> > > > > &vfio.vfio_mm_lock);
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(vfio_mm_put);
> > > > > +
> > > > > +/* Assume vfio_mm_lock or vfio_mm reference is held */ static
> > > > > +void vfio_mm_get(struct vfio_mm *vmm) {
> > > > > + kref_get(&vmm->kref);
> > > > > +}
> > > > > +
> > > > > +struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task) {
> > > > > + struct mm_struct *mm = get_task_mm(task);
> > > > > + struct vfio_mm *vmm;
> > > > > + unsigned long long val = (unsigned long long) mm;
> > > > > +
> > > > > + mutex_lock(&vfio.vfio_mm_lock);
> > > > > + list_for_each_entry(vmm, &vfio.vfio_mm_list, vfio_next) {
> > > > > + if (vmm->token.val == val) {
> > > > > + vfio_mm_get(vmm);
> > > > > + goto out;
> > > > > + }
> > > > > + }
> > > > > +
> > > > > + vmm = vfio_create_mm(mm);
> > > > > + if (IS_ERR(vmm))
> > > > > + vmm = NULL;
> > > > > +out:
> > > > > + mutex_unlock(&vfio.vfio_mm_lock);
> > > > > + mmput(mm);
> > > >
> > > > I assume this has been discussed before, but from readability p.o.v
> > > > it might be good to add a comment for this function to explain how
> > > > the recording of mm in vfio_mm can be correctly removed when the
> mm
> > > > is being destroyed, since we don't hold a reference of mm here.
> > >
> > > yeah, I'll add it.
> > >
> > > > > + return vmm;
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(vfio_mm_get_from_task);
> > > > > +
> > > > > +int vfio_mm_pasid_alloc(struct vfio_mm *vmm, int min, int max) {
> > > > > + ioasid_t pasid;
> > > > > + int ret = -ENOSPC;
> > > > > +
> > > > > + mutex_lock(&vmm->pasid_lock);
> > > > > +
> > > > > + pasid = ioasid_alloc(vmm->ioasid_sid, min, max, NULL);
> > > > > + if (pasid == INVALID_IOASID) {
> > > > > + ret = -ENOSPC;
> > > > > + goto out_unlock;
> > > > > + }
> > > > > +
> > > > > + ret = pasid;
> > > > > +out_unlock:
> > > > > + mutex_unlock(&vmm->pasid_lock);
> > > > > + return ret;
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(vfio_mm_pasid_alloc);
> > > > > +
> > > > > +int vfio_mm_pasid_free(struct vfio_mm *vmm, ioasid_t pasid) {
> > > > > + void *pdata;
> > > > > + int ret = 0;
> > > > > +
> > > > > + mutex_lock(&vmm->pasid_lock);
> > > > > + pdata = ioasid_find(vmm->ioasid_sid, pasid, NULL);
> > > > > + if (IS_ERR(pdata)) {
> > > > > + ret = PTR_ERR(pdata);
> > > > > + goto out_unlock;
> > > > > + }
> > > > > + ioasid_free(pasid);
> > > > > +
> > > > > +out_unlock:
> > > > > + mutex_unlock(&vmm->pasid_lock);
> > > > > + return ret;
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(vfio_mm_pasid_free);
> > > > > +
> > > > > +/**
> > > > > * Module/class support
> > > > > */
> > > > > static char *vfio_devnode(struct device *dev, umode_t *mode) @@
> > > > > -2151,8 +2279,10 @@ static int __init vfio_init(void)
> > > > > idr_init(&vfio.group_idr);
> > > > > mutex_init(&vfio.group_lock);
> > > > > mutex_init(&vfio.iommu_drivers_lock);
> > > > > + mutex_init(&vfio.vfio_mm_lock);
> > > > > INIT_LIST_HEAD(&vfio.group_list);
> > > > > INIT_LIST_HEAD(&vfio.iommu_drivers_list);
> > > > > + INIT_LIST_HEAD(&vfio.vfio_mm_list);
> > > > > init_waitqueue_head(&vfio.release_q);
> > > > >
> > > > > ret = misc_register(&vfio_dev);
> > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > > b/drivers/vfio/vfio_iommu_type1.c index a177bf2..331ceee 100644
> > > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > > @@ -70,6 +70,7 @@ struct vfio_iommu {
> > > > > unsigned int dma_avail;
> > > > > bool v2;
> > > > > bool nesting;
> > > > > + struct vfio_mm *vmm;
> > > > > };
> > > > >
> > > > > struct vfio_domain {
> > > > > @@ -2018,6 +2019,7 @@ static void
> > > vfio_iommu_type1_detach_group(void
> > > > > *iommu_data,
> > > > > static void *vfio_iommu_type1_open(unsigned long arg) {
> > > > > struct vfio_iommu *iommu;
> > > > > + struct vfio_mm *vmm = NULL;
> > > > >
> > > > > iommu = kzalloc(sizeof(*iommu), GFP_KERNEL);
> > > > > if (!iommu)
> > > > > @@ -2043,6 +2045,10 @@ static void
> > > *vfio_iommu_type1_open(unsigned
> > > > > long arg)
> > > > > iommu->dma_avail = dma_entry_limit;
> > > > > mutex_init(&iommu->lock);
> > > > > BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
> > > > > + vmm = vfio_mm_get_from_task(current);
> > > > > + if (!vmm)
> > > > > + pr_err("Failed to get vfio_mm track\n");
> > > >
> > > > I assume error should be returned when pr_err is used...
> > >
> > > got it. I didn't do it as I donât think vfio_mm is necessary for
> > > every iommu open. It is necessary for the nesting type iommu. I'll
> > > make it fetch vmm when it is opening nesting type and return error
> > > if failed.
> >
> > sounds good.
> >
> > >
> > > > > + iommu->vmm = vmm;
> > > > >
> > > > > return iommu;
> > > > > }
> > > > > @@ -2084,6 +2090,8 @@ static void vfio_iommu_type1_release(void
> > > > > *iommu_data)
> > > > > }
> > > > >
> > > > > vfio_iommu_iova_free(&iommu->iova_list);
> > > > > + if (iommu->vmm)
> > > > > + vfio_mm_put(iommu->vmm);
> > > > >
> > > > > kfree(iommu);
> > > > > }
> > > > > @@ -2172,6 +2180,55 @@ static int
> vfio_iommu_iova_build_caps(struct
> > > > > vfio_iommu *iommu,
> > > > > return ret;
> > > > > }
> > > > >
> > > > > +static bool vfio_iommu_type1_pasid_req_valid(u32 flags)
> > > >
> > > > I don't think you need prefix "vfio_iommu_type1" for every new
> > > > function here, especially for leaf internal function as this one.
> > >
> > > got it. thanks.
> > >
> > > > > +{
> > > > > + return !((flags & ~VFIO_PASID_REQUEST_MASK) ||
> > > > > + (flags & VFIO_IOMMU_PASID_ALLOC &&
> > > > > + flags & VFIO_IOMMU_PASID_FREE));
> > > > > +}
> > > > > +
> > > > > +static int vfio_iommu_type1_pasid_alloc(struct vfio_iommu *iommu,
> > > > > + int min,
> > > > > + int max)
> > > > > +{
> > > > > + struct vfio_mm *vmm = iommu->vmm;
> > > > > + int ret = 0;
> > > > > +
> > > > > + mutex_lock(&iommu->lock);
> > > > > + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
> > > > > + ret = -EFAULT;
> > > >
> > > > why -EFAULT?
> > >
> > > well, it's from a prior comment as below:
> > > vfio_mm_pasid_alloc() can return -ENOSPC though, so it'd be nice to
> > > differentiate the errors. We could use EFAULT for the no IOMMU case
> > > and EINVAL here?
> > > http://lkml.iu.edu/hypermail/linux/kernel/2001.3/05964.html
> > >
> > > >
> > > > > + goto out_unlock;
> > > > > + }
> > > > > + if (vmm)
> > > > > + ret = vfio_mm_pasid_alloc(vmm, min, max);
> > > > > + else
> > > > > + ret = -EINVAL;
> > > > > +out_unlock:
> > > > > + mutex_unlock(&iommu->lock);
> > > > > + return ret;
> > > > > +}
> > > > > +
> > > > > +static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu,
> > > > > + unsigned int pasid)
> > > > > +{
> > > > > + struct vfio_mm *vmm = iommu->vmm;
> > > > > + int ret = 0;
> > > > > +
> > > > > + mutex_lock(&iommu->lock);
> > > > > + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
> > > > > + ret = -EFAULT;
> > > >
> > > > ditto
> > > >
> > > > > + goto out_unlock;
> > > > > + }
> > > > > +
> > > > > + if (vmm)
> > > > > + ret = vfio_mm_pasid_free(vmm, pasid);
> > > > > + else
> > > > > + ret = -EINVAL;
> > > > > +out_unlock:
> > > > > + mutex_unlock(&iommu->lock);
> > > > > + return ret;
> > > > > +}
> > > > > +
> > > > > static long vfio_iommu_type1_ioctl(void *iommu_data,
> > > > > unsigned int cmd, unsigned long arg)
> > > > > {
> > > > > @@ -2276,6 +2333,53 @@ static long vfio_iommu_type1_ioctl(void
> > > > > *iommu_data,
> > > > >
> > > > > return copy_to_user((void __user *)arg, &unmap, minsz) ?
> > > > > -EFAULT : 0;
> > > > > +
> > > > > + } else if (cmd == VFIO_IOMMU_PASID_REQUEST) {
> > > > > + struct vfio_iommu_type1_pasid_request req;
> > > > > + unsigned long offset;
> > > > > +
> > > > > + minsz = offsetofend(struct
> vfio_iommu_type1_pasid_request,
> > > > > + flags);
> > > > > +
> > > > > + if (copy_from_user(&req, (void __user *)arg, minsz))
> > > > > + return -EFAULT;
> > > > > +
> > > > > + if (req.argsz < minsz ||
> > > > > + !vfio_iommu_type1_pasid_req_valid(req.flags))
> > > > > + return -EINVAL;
> > > > > +
> > > > > + if (copy_from_user((void *)&req + minsz,
> > > > > + (void __user *)arg + minsz,
> > > > > + sizeof(req) - minsz))
> > > > > + return -EFAULT;
> > > >
> > > > why copying in two steps instead of copying them together?
> > >
> > > just want to do sanity check before copying all the data. I
> > > can move it as one copy if it's better. :-)
> >
> > it's possible fine. I just saw you did same thing for other uapis.
> >
> > >
> > > > > +
> > > > > + switch (req.flags & VFIO_PASID_REQUEST_MASK) {
> > > > > + case VFIO_IOMMU_PASID_ALLOC:
> > > > > + {
> > > > > + int ret = 0, result;
> > > > > +
> > > > > + result =
> vfio_iommu_type1_pasid_alloc(iommu,
> > > > > +
> req.alloc_pasid.min,
> > > > > +
> req.alloc_pasid.max);
> > > > > + if (result > 0) {
> > > > > + offset = offsetof(
> > > > > + struct
> > > > > vfio_iommu_type1_pasid_request,
> > > > > + alloc_pasid.result);
> > > > > + ret = copy_to_user(
> > > > > + (void __user *) (arg +
> offset),
> > > > > + &result, sizeof(result));
> > > > > + } else {
> > > > > + pr_debug("%s: PASID alloc failed\n",
> > > > > __func__);
> > > > > + ret = -EFAULT;
> > > >
> > > > no, this branch is not for copy_to_user error. it is about pasid alloc
> > > > failure. you should handle both.
> > >
> > > Emmm, I just want to fail the IOCTL in such case, so the @result field
> > > is meaningless in the user side. How about using another return value
> > > (e.g. ENOSPC) to indicate the IOCTL failure?
> >
> > If pasid_alloc fails, you return its result to userspace
> > if copy_to_user fails, then return -EFAULT.
> >
> > however, above you return -EFAULT for pasid_alloc failure, and
> > then the number of not-copied bytes for copy_to_user.
>
> not quite get. Let me re-paste the code. :-)
>
> + case VFIO_IOMMU_PASID_ALLOC:
> + {
> + int ret = 0, result;
> +
> + result = vfio_iommu_type1_pasid_alloc(iommu,
> + req.alloc_pasid.min,
> + req.alloc_pasid.max);
> + if (result > 0) {
> + offset = offsetof(
> + struct
> vfio_iommu_type1_pasid_request,
> + alloc_pasid.result);
> + ret = copy_to_user(
> + (void __user *) (arg + offset),
> + &result, sizeof(result));
> if copy_to_user failed, ret is the number of uncopied bytes and
> will be returned to userspace to indicate failure. userspace will
> not use the data in result field. perhaps, I should check the ret
> here and return -EFAULT or another errno, instead of return the
> number of un-copied bytes.
here should return -EFAULT.
> + } else {
> + pr_debug("%s: PASID alloc failed\n",
> __func__);
> + ret = -EFAULT;
> if vfio_iommu_type1_pasid_alloc() failed, no doubt, return -EFAULT
> to userspace to indicate failure.
pasid_alloc has its own error types returned. why blindly replace it
with -EFAULT?
> + }
> + return ret;
> + }
>
> is there still porblem here?
> > >
> > > > > + }
> > > > > + return ret;
> > > > > + }
> > > > > + case VFIO_IOMMU_PASID_FREE:
> > > > > + return vfio_iommu_type1_pasid_free(iommu,
> > > > > +
> req.free_pasid);
> > > > > + default:
> > > > > + return -EINVAL;
> > > > > + }
> > > > > }
> > > > >
> > > > > return -ENOTTY;
> > > > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > > > index e42a711..75f9f7f1 100644
> > > > > --- a/include/linux/vfio.h
> > > > > +++ b/include/linux/vfio.h
> > > > > @@ -89,6 +89,26 @@ extern int vfio_register_iommu_driver(const
> struct
> > > > > vfio_iommu_driver_ops *ops);
> > > > > extern void vfio_unregister_iommu_driver(
> > > > > const struct vfio_iommu_driver_ops *ops);
> > > > >
> > > > > +#define VFIO_DEFAULT_PASID_QUOTA 1000
> > > > > +struct vfio_mm_token {
> > > > > + unsigned long long val;
> > > > > +};
> > > > > +
> > > > > +struct vfio_mm {
> > > > > + struct kref kref;
> > > > > + struct vfio_mm_token token;
> > > > > + int ioasid_sid;
> > > > > + /* protect @pasid_quota field and pasid allocation/free */
> > > > > + struct mutex pasid_lock;
> > > > > + int pasid_quota;
> > > > > + struct list_head vfio_next;
> > > > > +};
> > > > > +
> > > > > +extern struct vfio_mm *vfio_mm_get_from_task(struct task_struct
> > > *task);
> > > > > +extern void vfio_mm_put(struct vfio_mm *vmm);
> > > > > +extern int vfio_mm_pasid_alloc(struct vfio_mm *vmm, int min, int
> max);
> > > > > +extern int vfio_mm_pasid_free(struct vfio_mm *vmm, ioasid_t
> pasid);
> > > > > +
> > > > > /*
> > > > > * External user API
> > > > > */
> > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > > > index 9e843a1..298ac80 100644
> > > > > --- a/include/uapi/linux/vfio.h
> > > > > +++ b/include/uapi/linux/vfio.h
> > > > > @@ -794,6 +794,47 @@ struct vfio_iommu_type1_dma_unmap {
> > > > > #define VFIO_IOMMU_ENABLE _IO(VFIO_TYPE, VFIO_BASE + 15)
> > > > > #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16)
> > > > >
> > > > > +/*
> > > > > + * PASID (Process Address Space ID) is a PCIe concept which
> > > > > + * has been extended to support DMA isolation in fine-grain.
> > > > > + * With device assigned to user space (e.g. VMs), PASID alloc
> > > > > + * and free need to be system wide. This structure defines
> > > > > + * the info for pasid alloc/free between user space and kernel
> > > > > + * space.
> > > > > + *
> > > > > + * @flag=VFIO_IOMMU_PASID_ALLOC, refer to the @alloc_pasid
> > > > > + * @flag=VFIO_IOMMU_PASID_FREE, refer to @free_pasid
> > > > > + */
> > > > > +struct vfio_iommu_type1_pasid_request {
> > > > > + __u32 argsz;
> > > > > +#define VFIO_IOMMU_PASID_ALLOC (1 << 0)
> > > > > +#define VFIO_IOMMU_PASID_FREE (1 << 1)
> > > > > + __u32 flags;
> > > > > + union {
> > > > > + struct {
> > > > > + __u32 min;
> > > > > + __u32 max;
> > > > > + __u32 result;
> > > >
> > > > result->pasid?
> > >
> > > yes, the pasid allocated.
> > >
> > > >
> > > > > + } alloc_pasid;
> > > > > + __u32 free_pasid;
> > > >
> > > > what about putting a common pasid field after flags?
> > >
> > > looks good to me. But it would make the union part only meaningful
> > > to alloc pasid. if so, maybe make the union part as a data field and
> > > only alloc pasid will have it. For free pasid, it is not necessary
> > > to read it from userspace. does it look good?
> >
> > maybe keeping the union is also OK, just with {min, max} for alloc.
> > who knows whether more pasid ops will be added in the future
> > which may require its specific union structure. ?? putting pasid
> > as a common field is reasonable because the whole cmd is for
> > pasid.
>
> got it.
>
> Thanks,
> Yi Liu