RE: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

From: Liu, Yi L
Date: Fri Oct 15 2021 - 05:18:26 EST


> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Sent: Tuesday, September 21, 2021 11:42 PM
>
> On Sun, Sep 19, 2021 at 02:38:29PM +0800, Liu Yi L wrote:
> > /dev/iommu aims to provide a unified interface for managing I/O address
> > spaces for devices assigned to userspace. This patch adds the initial
> > framework to create a /dev/iommu node. Each open of this node returns an
> > iommufd. And this fd is the handle for userspace to initiate its I/O
> > address space management.
> >
> > One open:
> > - We call this feature as IOMMUFD in Kconfig in this RFC. However this
> > name is not clear enough to indicate its purpose to user. Back to 2010
> > vfio even introduced a /dev/uiommu [1] as the predecessor of its
> > container concept. Is that a better name? Appreciate opinions here.
> >
> > [1]
> https://lore.kernel.org/kvm/4c0eb470.1HMjondO00NIvFM6%25pugs@xxxxxxxx
> m/
> >
> > Signed-off-by: Liu Yi L <yi.l.liu@xxxxxxxxx>
> > drivers/iommu/Kconfig | 1 +
> > drivers/iommu/Makefile | 1 +
> > drivers/iommu/iommufd/Kconfig | 11 ++++
> > drivers/iommu/iommufd/Makefile | 2 +
> > drivers/iommu/iommufd/iommufd.c | 112
> ++++++++++++++++++++++++++++++++
> > 5 files changed, 127 insertions(+)
> > create mode 100644 drivers/iommu/iommufd/Kconfig
> > create mode 100644 drivers/iommu/iommufd/Makefile
> > create mode 100644 drivers/iommu/iommufd/iommufd.c
> >
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index 07b7c25cbed8..a83ce0acd09d 100644
> > +++ b/drivers/iommu/Kconfig
> > @@ -136,6 +136,7 @@ config MSM_IOMMU
> >
> > source "drivers/iommu/amd/Kconfig"
> > source "drivers/iommu/intel/Kconfig"
> > +source "drivers/iommu/iommufd/Kconfig"
> >
> > config IRQ_REMAP
> > bool "Support for Interrupt Remapping"
> > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> > index c0fb0ba88143..719c799f23ad 100644
> > +++ b/drivers/iommu/Makefile
> > @@ -29,3 +29,4 @@ obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
> > obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> > obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o io-pgfault.o
> > obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
> > +obj-$(CONFIG_IOMMUFD) += iommufd/
> > diff --git a/drivers/iommu/iommufd/Kconfig
> b/drivers/iommu/iommufd/Kconfig
> > new file mode 100644
> > index 000000000000..9fb7769a815d
> > +++ b/drivers/iommu/iommufd/Kconfig
> > @@ -0,0 +1,11 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +config IOMMUFD
> > + tristate "I/O Address Space management framework for passthrough
> devices"
> > + select IOMMU_API
> > + default n
> > + help
> > + provides unified I/O address space management framework for
> > + isolating untrusted DMAs via devices which are passed through
> > + to userspace drivers.
> > +
> > + If you don't know what to do here, say N.
> > diff --git a/drivers/iommu/iommufd/Makefile
> b/drivers/iommu/iommufd/Makefile
> > new file mode 100644
> > index 000000000000..54381a01d003
> > +++ b/drivers/iommu/iommufd/Makefile
> > @@ -0,0 +1,2 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +obj-$(CONFIG_IOMMUFD) += iommufd.o
> > diff --git a/drivers/iommu/iommufd/iommufd.c
> b/drivers/iommu/iommufd/iommufd.c
> > new file mode 100644
> > index 000000000000..710b7e62988b
> > +++ b/drivers/iommu/iommufd/iommufd.c
> > @@ -0,0 +1,112 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * I/O Address Space Management for passthrough devices
> > + *
> > + * Copyright (C) 2021 Intel Corporation
> > + *
> > + * Author: Liu Yi L <yi.l.liu@xxxxxxxxx>
> > + */
> > +
> > +#define pr_fmt(fmt) "iommufd: " fmt
> > +
> > +#include <linux/file.h>
> > +#include <linux/fs.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/miscdevice.h>
> > +#include <linux/mutex.h>
> > +#include <linux/iommu.h>
> > +
> > +/* Per iommufd */
> > +struct iommufd_ctx {
> > + refcount_t refs;
> > +};
>
> A private_data of a struct file should avoid having a refcount (and
> this should have been a kref anyhow)
>
> Use the refcount on the struct file instead.
>
> In general the lifetime models look overly convoluted to me with
> refcounts being used as locks and going in all manner of directions.
>
> - No refcount on iommufd_ctx, this should use the fget on the fd.
> The driver facing version of the API has the driver holds a fget
> inside the iommufd_device.
>
> - Put a rwlock inside the iommufd_ioas that is a
> 'destroying_lock'. The rwlock starts out unlocked.
>
> Acquire from the xarray is
> rcu_lock()
> ioas = xa_load()
> if (ioas)
> if (down_read_trylock(&ioas->destroying_lock))

all good suggestions, will refine accordingly. Here destroying_lock is a
rw_semaphore. right? Since down_read_trylock() accepts a rwsem.

Thanks,
Yi Liu

> // success
> Unacquire is just up_read()
>
> Do down_write when the ioas is to be destroyed, do not return ebusy.
>
> - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc does
> not need locking (order it properly too, it is in the wrong order), and
> don't check for duplicate devices or dev_cookie duplication, that
> is user error and is harmless to the kernel.
>
> > +static int iommufd_fops_release(struct inode *inode, struct file *filep)
> > +{
> > + struct iommufd_ctx *ictx = filep->private_data;
> > +
> > + filep->private_data = NULL;
>
> unnecessary
>
> > + iommufd_ctx_put(ictx);
> > +
> > + return 0;
> > +}
> > +
> > +static long iommufd_fops_unl_ioctl(struct file *filep,
> > + unsigned int cmd, unsigned long arg)
> > +{
> > + struct iommufd_ctx *ictx = filep->private_data;
> > + long ret = -EINVAL;
> > +
> > + if (!ictx)
> > + return ret;
>
> impossible
>
> > +
> > + switch (cmd) {
> > + default:
> > + pr_err_ratelimited("unsupported cmd %u\n", cmd);
>
> don't log user triggerable events
>
> Jason