Re: [RFC 11/20] iommu/iommufd: Add IOMMU_IOASID_ALLOC/FREE
From: Jason Gunthorpe
Date: Wed Sep 22 2021 - 10:09:21 EST
On Wed, Sep 22, 2021 at 03:40:25AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Sent: Wednesday, September 22, 2021 1:45 AM
> >
> > On Sun, Sep 19, 2021 at 02:38:39PM +0800, Liu Yi L wrote:
> > > This patch adds IOASID allocation/free interface per iommufd. When
> > > allocating an IOASID, userspace is expected to specify the type and
> > > format information for the target I/O page table.
> > >
> > > This RFC supports only one type (IOMMU_IOASID_TYPE_KERNEL_TYPE1V2),
> > > implying a kernel-managed I/O page table with vfio type1v2 mapping
> > > semantics. For this type the user should specify the addr_width of
> > > the I/O address space and whether the I/O page table is created in
> > > an iommu enfore_snoop format. enforce_snoop must be true at this point,
> > > as the false setting requires additional contract with KVM on handling
> > > WBINVD emulation, which can be added later.
> > >
> > > Userspace is expected to call IOMMU_CHECK_EXTENSION (see next patch)
> > > for what formats can be specified when allocating an IOASID.
> > >
> > > Open:
> > > - Devices on PPC platform currently use a different iommu driver in vfio.
> > > Per previous discussion they can also use vfio type1v2 as long as there
> > > is a way to claim a specific iova range from a system-wide address space.
> > > This requirement doesn't sound PPC specific, as addr_width for pci
> > devices
> > > can be also represented by a range [0, 2^addr_width-1]. This RFC hasn't
> > > adopted this design yet. We hope to have formal alignment in v1
> > discussion
> > > and then decide how to incorporate it in v2.
> >
> > I think the request was to include a start/end IO address hint when
> > creating the ios. When the kernel creates it then it can return the
>
> is the hint single-range or could be multiple-ranges?
David explained it here:
https://lore.kernel.org/kvm/YMrKksUeNW%2FPEGPM@yekko/
qeumu needs to be able to chooose if it gets the 32 bit range or 64
bit range.
So a 'range hint' will do the job
David also suggested this:
https://lore.kernel.org/kvm/YL6%2FbjHyuHJTn4Rd@yekko/
So I like this better:
struct iommu_ioasid_alloc {
__u32 argsz;
__u32 flags;
#define IOMMU_IOASID_ENFORCE_SNOOP (1 << 0)
#define IOMMU_IOASID_HINT_BASE_IOVA (1 << 1)
__aligned_u64 max_iova_hint;
__aligned_u64 base_iova_hint; // Used only if IOMMU_IOASID_HINT_BASE_IOVA
// For creating nested page tables
__u32 parent_ios_id;
__u32 format;
#define IOMMU_FORMAT_KERNEL 0
#define IOMMU_FORMAT_PPC_XXX 2
#define IOMMU_FORMAT_[..]
u32 format_flags; // Layout depends on format above
__aligned_u64 user_page_directory; // Used if parent_ios_id != 0
};
Again 'type' as an overall API indicator should not exist, feature
flags need to have clear narrow meanings.
This does both of David's suggestions at once. If quemu wants the 1G
limited region it could specify max_iova_hint = 1G, if it wants the
extend 64bit region with the hole it can give either the high base or
a large max_iova_hint. format/format_flags allows a further
device-specific escape if more specific customization is needed and is
needed to specify user space page tables anyhow.
> > ioas works well here I think. Use ioas_id to refer to the xarray
> > index.
>
> What about when introducing pasid to this uAPI? Then use ioas_id
> for the xarray index
Yes, ioas_id should always be the xarray index.
PASID needs to be called out as PASID or as a generic "hw description"
blob.
kvm's API to program the vPASID translation table should probably take
in a (iommufd,ioas_id,device_id) tuple and extract the IOMMU side
information using an in-kernel API. Userspace shouldn't have to
shuttle it around.
I'm starting to feel like the struct approach for describing this uAPI
might not scale well, but lets see..
Jason