Re: [RFC 11/20] iommu/iommufd: Add IOMMU_IOASID_ALLOC/FREE

From: David Gibson
Date: Mon Oct 18 2021 - 00:14:35 EST


On Thu, Oct 14, 2021 at 11:52:08AM -0300, Jason Gunthorpe wrote:
> On Thu, Oct 14, 2021 at 03:53:33PM +1100, David Gibson wrote:
>
> > > My feeling is that qemu should be dealing with the host != target
> > > case, not the kernel.
> > >
> > > The kernel's job should be to expose the IOMMU HW it has, with all
> > > features accessible, to userspace.
> >
> > See... to me this is contrary to the point we agreed on above.
>
> I'm not thinking of these as exclusive ideas.
>
> The IOCTL interface in iommu can quite happily expose:
> Create IOAS generically
> Manipulate IOAS generically
> Create IOAS with IOMMU driver specific attributes
> HW specific Manipulate IOAS
>
> IOCTL commands all together.
>
> So long as everything is focused on a generic in-kernel IOAS object it
> is fine to have multiple ways in the uAPI to create and manipulate the
> objects.
>
> When I speak about a generic interface I mean "Create IOAS
> generically" - ie a set of IOCTLs that work on most IOMMU HW and can
> be relied upon by things like DPDK/etc to always work and be portable.
> This is why I like "hints" to provide some limited widely applicable
> micro-optimization.
>
> When I said "expose the IOMMU HW it has with all features accessible"
> I mean also providing "Create IOAS with IOMMU driver specific
> attributes".
>
> These other IOCTLs would allow the IOMMU driver to expose every
> configuration knob its HW has, in a natural HW centric language.
> There is no pretense of genericness here, no crazy foo=A, foo=B hidden
> device specific interface.
>
> Think of it as a high level/low level interface to the same thing.

Ok, I see what you mean.

> > Those are certainly wrong, but they came about explicitly by *not*
> > being generic rather than by being too generic. So I'm really
> > confused aso to what you're arguing for / against.
>
> IMHO it is not having a PPC specific interface that was the problem,
> it was making the PPC specific interface exclusive to the type 1
> interface. If type 1 continued to work on PPC then DPDK/etc would
> never learned PPC specific code.

Ok, but the reason this happened is that the initial version of type 1
*could not* be used on PPC. The original Type 1 implicitly promised a
"large" IOVA range beginning at IOVA 0 without any real way of
specifying or discovering how large that range was. Since ppc could
typically only give a 2GiB range at IOVA 0, that wasn't usable.

That's why I say the problem was not making type1 generic enough. I
believe the current version of Type1 has addressed this - at least
enough to be usable in common cases. But by this time the ppc backend
is already out there, so no-one's had the capacity to go back and make
ppc work with Type1.

> For iommufd with the high/low interface each IOMMU HW should ask basic
> questions:
>
> - What should the generic high level interface do on this HW?
> For instance what should 'Create IOAS generically' do for PPC?
> It should not fail, it should create *something*
> What is the best thing for DPDK?
> I guess the 64 bit window is most broadly useful.

Right, which means the kernel must (at least in the common case) have
the capcity to choose and report a non-zero base-IOVA.

Hrm... which makes me think... if we allow this for the common
kernel-managed case, do we even need to have capcity in the high-level
interface for reporting IO holes? If the kernel can choose a non-zero
base, it could just choose on x86 to place it's advertised window
above the IO hole.

> - How to accurately describe the HW in terms of standard IOAS objects
> and where to put HW specific structs to support this.
>
> This is where PPC would decide how best to expose a control over
> its low/high window (eg 1,2,3 IOAS). Whatever the IOMMU driver
> wants, so long as it fits into the kernel IOAS model facing the
> connected device driver.
>
> QEMU would have IOMMU userspace drivers. One would be the "generic
> driver" using only the high level generic interface. It should work as
> best it can on all HW devices. This is the fallback path you talked
> of.
>
> QEMU would also have HW specific IOMMU userspace drivers that know how
> to operate the exact HW. eg these drivers would know how to use
> userspace page tables, how to form IOPTEs and how to access the
> special features.
>
> This is how QEMU could use an optimzed path with nested page tables,
> for instance.

The concept makes sense in general. The devil's in the details, as usual.

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature