Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

From: Jason Gunthorpe
Date: Wed Apr 28 2021 - 16:46:36 EST


On Wed, Apr 28, 2021 at 06:34:11AM +0000, Tian, Kevin wrote:

> > If /dev/ioasid is single HW page table only then I would focus on that
> > implementation and leave it to userspace to span different
> > /dev/ioasids if needed.
> >
> > > OK, now I see where the disconnection comes from. In my context ioasid
> > > is the identifier that is actually used in the wire, but seems you treat it as
> > > a sw-defined namespace purely for representing page tables. We should
> > > clear this concept first before further discussing other details. 😊
> >
> > There is no general HW requirement that every IO page table be
> > referred to by the same PASID and this API would have to support
>
> Yes, but what is the value of allowing multiple PASIDs referring to the
> the same I/O page table (except the nesting pgtable case)? Doesn't it
> lead to poor iotlb efficiency issue similar to multiple iommu domains
> referring to the same page table?

I think iotlb efficiency is up to the platform.

The general use case is to make an IOASID for something like the GPA
and use it concurrently with three say three devices:
- VFIO (not PASID)
- VDPA (PASID capable HW)
- 'Future VDPA storage' (PASID capable HW)

The uAPI for this should be very general and the kernel should decide
the optimal way to configure the HW. Maybe it is one page table and
one PASID, or maybe it is something else.

Allowing the kernel to choose the PASID once it knows the RID is the
highest generality.

> > non-PASID IO page tables as well. So I'd keep the two things
> > separated in the uAPI - even though the kernel today has a global
> > PASID pool.
>
> for non-PASID usages the allocated PASID will be wasted if we don't
> separate ioasid from pasid. But it may be worthwhile given 1m available
> pasids and the simplification in the uAPI which only needs to care about
> one id space then.

I'd prefer this be a platform choice and not forced in the uAPI,
because we can never go back on it if we see that yes we need to
optimize here. I understand many platforms have different available
PASID spaces already.

> > Simple things like DPDK can use #2 and potentially have better PASID
> > limits. hypervisors will most likely have to use #1, but it depends on
> > how their vIOMMU interface works.
>
> Can you elaborate why DPDK wants to use #2 i.e. not using a global
> PASID?

It gives the kernel an option to make the decision about the PASID
when it has the full information, including the RID.

> > I think the name IOASID is fine for the uAPI, the kernel version can
> > be called ioasid_id or something.
>
> ioasid is already an id and then ioasid_id just adds confusion. Another
> point is that ioasid is currently used to represent both PCI PASID and
> ARM substream ID in the kernel. It implies that if we want to separate
> ioasid and pasid in the uAPI the 'pasid' also needs to be replaced with
> another general term usable for substream ID. Are we making the
> terms too confusing here?

This is why I also am not so sure about exposing the PASID in the API
because it is ultimately a HW specific item.

As I said to David, one avenue is to have some generic uAPI that is
very general and keep all this deeply detailed stuff, that really only
matters for qemu, as part of a more HW specific vIOMMU driver
interface.

Jason