Re: [RFC 11/20] iommu/iommufd: Add IOMMU_IOASID_ALLOC/FREE

From: david@xxxxxxxxxxxxxxxxxxxxx
Date: Sat Oct 02 2021 - 02:13:57 EST


On Fri, Oct 01, 2021 at 09:25:05AM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 01, 2021 at 04:19:22PM +1000, david@xxxxxxxxxxxxxxxxxxxxx wrote:
> > On Wed, Sep 22, 2021 at 11:09:11AM -0300, Jason Gunthorpe wrote:
> > > On Wed, Sep 22, 2021 at 03:40:25AM +0000, Tian, Kevin wrote:
> > > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > > > > Sent: Wednesday, September 22, 2021 1:45 AM
> > > > >
> > > > > On Sun, Sep 19, 2021 at 02:38:39PM +0800, Liu Yi L wrote:
> > > > > > This patch adds IOASID allocation/free interface per iommufd. When
> > > > > > allocating an IOASID, userspace is expected to specify the type and
> > > > > > format information for the target I/O page table.
> > > > > >
> > > > > > This RFC supports only one type (IOMMU_IOASID_TYPE_KERNEL_TYPE1V2),
> > > > > > implying a kernel-managed I/O page table with vfio type1v2 mapping
> > > > > > semantics. For this type the user should specify the addr_width of
> > > > > > the I/O address space and whether the I/O page table is created in
> > > > > > an iommu enfore_snoop format. enforce_snoop must be true at this point,
> > > > > > as the false setting requires additional contract with KVM on handling
> > > > > > WBINVD emulation, which can be added later.
> > > > > >
> > > > > > Userspace is expected to call IOMMU_CHECK_EXTENSION (see next patch)
> > > > > > for what formats can be specified when allocating an IOASID.
> > > > > >
> > > > > > Open:
> > > > > > - Devices on PPC platform currently use a different iommu driver in vfio.
> > > > > > Per previous discussion they can also use vfio type1v2 as long as there
> > > > > > is a way to claim a specific iova range from a system-wide address space.
> > > > > > This requirement doesn't sound PPC specific, as addr_width for pci
> > > > > devices
> > > > > > can be also represented by a range [0, 2^addr_width-1]. This RFC hasn't
> > > > > > adopted this design yet. We hope to have formal alignment in v1
> > > > > discussion
> > > > > > and then decide how to incorporate it in v2.
> > > > >
> > > > > I think the request was to include a start/end IO address hint when
> > > > > creating the ios. When the kernel creates it then it can return the
> > > >
> > > > is the hint single-range or could be multiple-ranges?
> > >
> > > David explained it here:
> > >
> > > https://lore.kernel.org/kvm/YMrKksUeNW%2FPEGPM@yekko/
> >
> > Apparently not well enough. I've attempted again in this thread.
> >
> > > qeumu needs to be able to chooose if it gets the 32 bit range or 64
> > > bit range.
> >
> > No. qemu needs to supply *both* the 32-bit and 64-bit range to its
> > guest, and therefore needs to request both from the host.
>
> As I understood your remarks each IOAS can only be one of the formats
> as they have a different PTE layout. So here I ment that qmeu needs to
> be able to pick *for each IOAS* which of the two formats it is.

No. Both windows are in the same IOAS. A device could do DMA
simultaneously to both windows. More realstically a 64-bit DMA
capable and a non-64-bit DMA capable device could be in the same group
and be doing DMAs to different windows simultaneously.

> > Or rather, it *might* need to supply both. It will supply just the
> > 32-bit range by default, but the guest can request the 64-bit range
> > and/or remove and resize the 32-bit range via hypercall interfaces.
> > Vaguely recent Linux guests certainly will request the 64-bit range in
> > addition to the default 32-bit range.
>
> And this would result in two different IOAS objects

There might be two different IOAS objects for setup, but at some point
they need to be combined into one IOAS to which the device is actually
attached.

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature