Re: [RFC] /dev/ioasid uAPI proposal

From: David Gibson
Date: Wed Jun 02 2021 - 03:29:36 EST


On Fri, May 28, 2021 at 08:36:49PM -0300, Jason Gunthorpe wrote:
> On Thu, May 27, 2021 at 07:58:12AM +0000, Tian, Kevin wrote:
>
> > 2.1. /dev/ioasid uAPI
> > +++++++++++++++++
> >
> > /*
> > * Check whether an uAPI extension is supported.
> > *
> > * This is for FD-level capabilities, such as locked page pre-registration.
> > * IOASID-level capabilities are reported through IOASID_GET_INFO.
> > *
> > * Return: 0 if not supported, 1 if supported.
> > */
> > #define IOASID_CHECK_EXTENSION _IO(IOASID_TYPE, IOASID_BASE + 0)
>
>
> > /*
> > * Register user space memory where DMA is allowed.
> > *
> > * It pins user pages and does the locked memory accounting so sub-
> > * sequent IOASID_MAP/UNMAP_DMA calls get faster.
> > *
> > * When this ioctl is not used, one user page might be accounted
> > * multiple times when it is mapped by multiple IOASIDs which are
> > * not nested together.
> > *
> > * Input parameters:
> > * - vaddr;
> > * - size;
> > *
> > * Return: 0 on success, -errno on failure.
> > */
> > #define IOASID_REGISTER_MEMORY _IO(IOASID_TYPE, IOASID_BASE + 1)
> > #define IOASID_UNREGISTER_MEMORY _IO(IOASID_TYPE, IOASID_BASE + 2)
>
> So VA ranges are pinned and stored in a tree and later references to
> those VA ranges by any other IOASID use the pin cached in the tree?
>
> It seems reasonable and is similar to the ioasid parent/child I
> suggested for PPC.
>
> IMHO this should be merged with the all SW IOASID that is required for
> today's mdev drivers. If this can be done while keeping this uAPI then
> great, otherwise I don't think it is so bad to weakly nest a physical
> IOASID under a SW one just to optimize page pinning.

Right, I think we can simplify the interface by modelling the
preregistration as a nesting layer. Well, mostly.. the wrinkle is
that generally you can't do anything with an ioasid until you've
attached devices to it, but that doesn't really make sense for the
prereg layer. I expect we can find some way to deal with that,
though.

Actually... to simplify that "weak nesting" concept I wonder if we
want to expand to 3 ways of specifying the pagetables for the ioasid:
1) kernel managed (MAP/UNMAP)
2) user managed (BIND/INVALIDATE)
3) pass-though (IOVA==parent address)

Obviously pass-through wouldn't be allowed in all circumstances.

> Either way this seems like a smart direction
>
> > /*
> > * Allocate an IOASID.
> > *
> > * IOASID is the FD-local software handle representing an I/O address
> > * space. Each IOASID is associated with a single I/O page table. User
> > * must call this ioctl to get an IOASID for every I/O address space that is
> > * intended to be enabled in the IOMMU.
> > *
> > * A newly-created IOASID doesn't accept any command before it is
> > * attached to a device. Once attached, an empty I/O page table is
> > * bound with the IOMMU then the user could use either DMA mapping
> > * or pgtable binding commands to manage this I/O page table.
>
> Can the IOASID can be populated before being attached?

I don't think it reasonably can. Until attached, you don't actually
know what hardware IOMMU will be backing it, and therefore you don't
know it's capabilities. You can't really allow mappings if you don't
even know allowed IOVA ranges and page size.

> > * Device attachment is initiated through device driver uAPI (e.g. VFIO)
> > *
> > * Return: allocated ioasid on success, -errno on failure.
> > */
> > #define IOASID_ALLOC _IO(IOASID_TYPE, IOASID_BASE + 3)
> > #define IOASID_FREE _IO(IOASID_TYPE, IOASID_BASE + 4)
>
> I assume alloc will include quite a big structure to satisfy the
> various vendor needs?
>
> > /*
> > * Get information about an I/O address space
> > *
> > * Supported capabilities:
> > * - VFIO type1 map/unmap;
> > * - pgtable/pasid_table binding
> > * - hardware nesting vs. software nesting;
> > * - ...
> > *
> > * Related attributes:
> > * - supported page sizes, reserved IOVA ranges (DMA mapping);
> > * - vendor pgtable formats (pgtable binding);
> > * - number of child IOASIDs (nesting);
> > * - ...
> > *
> > * Above information is available only after one or more devices are
> > * attached to the specified IOASID. Otherwise the IOASID is just a
> > * number w/o any capability or attribute.
>
> This feels wrong to learn most of these attributes of the IOASID after
> attaching to a device.

Yes... but as above, we have no idea what the IOMMU's capabilities are
until devices are attached.

> The user should have some idea how it intends to use the IOASID when
> it creates it and the rest of the system should match the intention.
>
> For instance if the user is creating a IOASID to cover the guest GPA
> with the intention of making children it should indicate this during
> alloc.
>
> If the user is intending to point a child IOASID to a guest page table
> in a certain descriptor format then it should indicate it during
> alloc.
>
> device bind should fail if the device somehow isn't compatible with
> the scheme the user is tring to use.

[snip]
> > 2.2. /dev/vfio uAPI
> > ++++++++++++++++
>
> To be clear you mean the 'struct vfio_device' API, these are not
> IOCTLs on the container or group?
>
> > /*
> > * Bind a vfio_device to the specified IOASID fd
> > *
> > * Multiple vfio devices can be bound to a single ioasid_fd, but a single
> > * vfio device should not be bound to multiple ioasid_fd's.
> > *
> > * Input parameters:
> > * - ioasid_fd;
> > *
> > * Return: 0 on success, -errno on failure.
> > */
> > #define VFIO_BIND_IOASID_FD _IO(VFIO_TYPE, VFIO_BASE + 22)
> > #define VFIO_UNBIND_IOASID_FD _IO(VFIO_TYPE, VFIO_BASE + 23)
>
> This is where it would make sense to have an output "device id" that
> allows /dev/ioasid to refer to this "device" by number in events and
> other related things.

The group number could be used for that, even if there are no group
fds. You generally can't identify things more narrowly than group
anyway.


--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature