Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces

From: David Gibson
Date: Wed Sep 29 2021 - 00:56:06 EST


On Sun, Sep 19, 2021 at 02:38:34PM +0800, Liu Yi L wrote:
> From: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
>
> This extends iommu core to manage security context for passthrough
> devices. Please bear a long explanation for how we reach this design
> instead of managing it solely in iommufd like what vfio does today.
>
> Devices which cannot be isolated from each other are organized into an
> iommu group. When a device is assigned to the user space, the entire
> group must be put in a security context so that user-initiated DMAs via
> the assigned device cannot harm the rest of the system. No user access
> should be granted on a device before the security context is established
> for the group which the device belongs to.
>
> Managing the security context must meet below criteria:
>
> 1) The group is viable for user-initiated DMAs. This implies that the
> devices in the group must be either bound to a device-passthrough
> framework, or driver-less, or bound to a driver which is known safe
> (not do DMA).
>
> 2) The security context should only allow DMA to the user's memory and
> devices in this group;
>
> 3) After the security context is established for the group, the group
> viability must be continuously monitored before the user relinquishes
> all devices belonging to the group. The viability might be broken e.g.
> when a driver-less device is later bound to a driver which does DMA.
>
> 4) The security context should not be destroyed before user access
> permission is withdrawn.
>
> Existing vfio introduces explicit container/group semantics in its uAPI
> to meet above requirements. A single security context (iommu domain)
> is created per container. Attaching group to container moves the entire
> group into the associated security context, and vice versa. The user can
> open the device only after group attach. A group can be detached only
> after all devices in the group are closed. Group viability is monitored
> by listening to iommu group events.
>
> Unlike vfio, iommufd adopts a device-centric design with all group
> logistics hidden behind the fd. Binding a device to iommufd serves
> as the contract to get security context established (and vice versa
> for unbinding). One additional requirement in iommufd is to manage the
> switch between multiple security contexts due to decoupled bind/attach:
>
> 1) Open a device in "/dev/vfio/devices" with user access blocked;

Probably worth clarifying that (1) must happen for *all* devices in
the group before (2) happens for any device in the group.

> 2) Bind the device to an iommufd with an initial security context
> (an empty iommu domain which blocks dma) established for its
> group, with user access unblocked;
>
> 3) Attach the device to a user-specified ioasid (shared by all devices
> attached to this ioasid). Before attaching, the device should be first
> detached from the initial context;

So, this step can implicitly but observably change the behaviour for
other devices in the group as well. I don't love that kind of
difficult to predict side effect, which is why I'm *still* not totally
convinced by the device-centric model.

> 4) Detach the device from the ioasid and switch it back to the initial
> security context;

Same non-local side effect at this step, of course.

Btw, explicitly naming the "no DMA" context is probably a good idea,
rather than referring to the "initial security context" (it's
"initial" from the PoV of the iommufd, but not from the PoV of the
device fd which was likely bound to the default kernel context before
(2)).

> 5) Unbind the device from the iommufd, back to access blocked state and
> move its group out of the initial security context if it's the last
> unbound device in the group;

Maybe worth clarifying that again (5) must happen for all devices in
the group before rebiding any devices to regular kernel drivers.
>
> (multiple attach/detach could happen between 2 and 5).
>
> However existing iommu core has problem with above transition. Detach
> in step 3/4 makes the device/group re-attached to the default domain
> automatically, which opens the door for user-initiated DMAs to attack
> the rest of the system. The existing vfio doesn't have this problem as
> it combines 2/3 in one step (so does 4/5).
>
> Fixing this problem requires the iommu core to also participate in the
> security context management. Following this direction we also move group
> viability check into the iommu core, which allows iommufd to stay fully
> device-centric w/o keeping any group knowledge (combining with the
> extension to iommu_at[de]tach_device() in a latter patch).
>
> Basically two new interfaces are provided:
>
> int iommu_device_init_user_dma(struct device *dev,
> unsigned long owner);
> void iommu_device_exit_user_dma(struct device *dev);
>
> iommufd calls them respectively when handling device binding/unbinding
> requests.
>
> The init_user_dma() for the 1st device in a group marks the entire group
> for user-dma and establishes the initial security context (dma blocked)
> according to aforementioned criteria. As long as the group is marked for
> user-dma, auto-reattaching to default domain is disabled. Instead, upon
> detaching the group is moved back to the initial security context.
>
> The caller also provides an owner id to mark the ownership so inadvertent
> attempt from another caller on the same device can be captured. In this
> RFC iommufd will use the fd context pointer as the owner id.
>
> The exit_user_dma() for the last device in the group clears the user-dma
> mark and moves the group back to the default domain.
>
> Signed-off-by: Kevin Tian <kevin.tian@xxxxxxxxx>
> Signed-off-by: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
> ---
> drivers/iommu/iommu.c | 145 +++++++++++++++++++++++++++++++++++++++++-
> include/linux/iommu.h | 12 ++++
> 2 files changed, 154 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 5ea3a007fd7c..bffd84e978fb 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -45,6 +45,8 @@ struct iommu_group {
> struct iommu_domain *default_domain;
> struct iommu_domain *domain;
> struct list_head entry;
> + unsigned long user_dma_owner_id;

Using an opaque integer doesn't seem like a good idea. I think you
probably want a pointer to a suitable struct dma_owner or the like
(you could have one embedded in each iommufd struct, plus a global
static kernel_default_owner).

> + refcount_t owner_cnt;
> };
>
> struct group_device {
> @@ -86,6 +88,7 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group,
> static struct iommu_group *iommu_group_get_for_dev(struct device *dev);
> static ssize_t iommu_group_store_type(struct iommu_group *group,
> const char *buf, size_t count);
> +static bool iommu_group_user_dma_viable(struct iommu_group *group);
>
> #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \
> struct iommu_group_attribute iommu_group_attr_##_name = \
> @@ -275,7 +278,11 @@ int iommu_probe_device(struct device *dev)
> */
> iommu_alloc_default_domain(group, dev);
>
> - if (group->default_domain) {
> + /*
> + * If any device in the group has been initialized for user dma,
> + * avoid attaching the default domain.
> + */
> + if (group->default_domain && !group->user_dma_owner_id) {
> ret = __iommu_attach_device(group->default_domain, dev);
> if (ret) {
> iommu_group_put(group);
> @@ -1664,6 +1671,17 @@ static int iommu_bus_notifier(struct notifier_block *nb,
> group_action = IOMMU_GROUP_NOTIFY_BIND_DRIVER;
> break;
> case BUS_NOTIFY_BOUND_DRIVER:
> + /*
> + * FIXME: Alternatively the attached drivers could generically
> + * indicate to the iommu layer that they are safe for keeping
> + * the iommu group user viable by calling some function around
> + * probe(). We could eliminate this gross BUG_ON() by denying
> + * probe to non-iommu-safe driver.
> + */
> + mutex_lock(&group->mutex);
> + if (group->user_dma_owner_id)
> + BUG_ON(!iommu_group_user_dma_viable(group));
> + mutex_unlock(&group->mutex);
> group_action = IOMMU_GROUP_NOTIFY_BOUND_DRIVER;
> break;
> case BUS_NOTIFY_UNBIND_DRIVER:
> @@ -2304,7 +2322,11 @@ static int __iommu_attach_group(struct iommu_domain *domain,
> {
> int ret;
>
> - if (group->default_domain && group->domain != group->default_domain)
> + /*
> + * group->domain could be NULL when a domain is detached from the
> + * group but the default_domain is not re-attached.
> + */
> + if (group->domain && group->domain != group->default_domain)
> return -EBUSY;
>
> ret = __iommu_group_for_each_dev(group, domain,
> @@ -2341,7 +2363,11 @@ static void __iommu_detach_group(struct iommu_domain *domain,
> {
> int ret;
>
> - if (!group->default_domain) {
> + /*
> + * If any device in the group has been initialized for user dma,
> + * avoid re-attaching the default domain.
> + */
> + if (!group->default_domain || group->user_dma_owner_id) {
> __iommu_group_for_each_dev(group, domain,
> iommu_group_do_detach_device);
> group->domain = NULL;
> @@ -3276,3 +3302,116 @@ int iommu_device_get_info(struct device *dev, enum iommu_devattr attr, void *dat
> return ops->device_info(dev, attr, data);
> }
> EXPORT_SYMBOL_GPL(iommu_device_get_info);
> +
> +/*
> + * IOMMU core interfaces for iommufd.
> + */
> +
> +/*
> + * FIXME: We currently simply follow vifo policy to mantain the group's
> + * viability to user. Eventually, we should avoid below hard-coded list
> + * by letting drivers indicate to the iommu layer that they are safe for
> + * keeping the iommu group's user aviability.
> + */
> +static const char * const iommu_driver_allowed[] = {
> + "vfio-pci",
> + "pci-stub"
> +};
> +
> +/*
> + * An iommu group is viable for use by userspace if all devices are in
> + * one of the following states:
> + * - driver-less
> + * - bound to an allowed driver
> + * - a PCI interconnect device
> + */
> +static int device_user_dma_viable(struct device *dev, void *data)

I think this wants a "less friendly" more obviously local name.
Really the only safe way to call this is via
iommu_group_user_dma_viable(), which isn't obvious from this name.

> +{
> + struct device_driver *drv = READ_ONCE(dev->driver);
> +
> + if (!drv)
> + return 0;
> +
> + if (dev_is_pci(dev)) {
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
> + return 0;
> + }
> +
> + return match_string(iommu_driver_allowed,
> + ARRAY_SIZE(iommu_driver_allowed),
> + drv->name) < 0;
> +}

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature