RE: [RFC 11/20] iommu/iommufd: Add IOMMU_IOASID_ALLOC/FREE

From: Tian, Kevin
Date: Thu Sep 23 2021 - 09:21:05 EST


> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Sent: Thursday, September 23, 2021 9:02 PM
>
> On Thu, Sep 23, 2021 at 12:45:17PM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > > Sent: Thursday, September 23, 2021 8:31 PM
> > >
> > > On Thu, Sep 23, 2021 at 12:22:23PM +0000, Tian, Kevin wrote:
> > > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > > > > Sent: Thursday, September 23, 2021 8:07 PM
> > > > >
> > > > > On Thu, Sep 23, 2021 at 09:14:58AM +0000, Tian, Kevin wrote:
> > > > >
> > > > > > currently the type is aimed to differentiate three usages:
> > > > > >
> > > > > > - kernel-managed I/O page table
> > > > > > - user-managed I/O page table
> > > > > > - shared I/O page table (e.g. with mm, or ept)
> > > > >
> > > > > Creating a shared ios is something that should probably be a different
> > > > > command.
> > > >
> > > > why? I didn't understand the criteria here...
> > >
> > > I suspect the input args will be very different, no?
> >
> > yes, but can't the structure be extended to incorporate it?
>
> You need to be thoughtful, giant structures with endless combinations
> of optional fields turn out very hard. I haven't even seen what args
> this shared thing will need, but I'm guessing it is almost none, so
> maybe a new call is OK?

To judge this looks we may have to do some practice on this front
e.g. coming up an example structure for future intended usages and
then see whether one structure can fit?

>
> If it is literally just 'give me an ioas for current mm' then it has
> no args or complexity at all.

for mm, yes, should be simple. for ept it might be more complex e.g.
requiring a handle in kvm and some other format info to match ept
page table.

>
> > > > > > we can remove 'type', but is FORMAT_KENREL/USER/SHARED a good
> > > > > > indicator? their difference is not about format.
> > > > >
> > > > > Format should be
> > > > >
> > > > >
> FORMAT_KERNEL/FORMAT_INTEL_PTE_V1/FORMAT_INTEL_PTE_V2/etc
> > > >
> > > > INTEL_PTE_V1/V2 are formats. Why is kernel-managed called a format?
> > >
> > > So long as we are using structs we need to have values then the field
> > > isn't being used. FORMAT_KERNEL is a reasonable value to have when we
> > > are not creating a userspace page table.
> > >
> > > Alternatively a userspace page table could have a different API
> >
> > I don't know. Your comments really confused me on what's the right
> > way to design the uAPI. If you still remember, the original v1 proposal
> > introduced different uAPIs for kernel/user-managed cases. Then you
> > recommended to consolidate everything related to ioas in one allocation
> > command.
>
> This is because you had almost completely duplicated the input args
> between the two calls.
>
> If it turns out they have very different args, then they should have
> different calls.
>
> > > > - open iommufd
> > > > - create an ioas
> > > > - attach vfio device to ioasid, with vPASID info
> > > > * vfio converts vPASID to pPASID and then call
> > > iommufd_device_attach_ioasid()
> > > > * the latter then installs ioas to the IOMMU with RID/PASID
> > >
> > > This was your flow for mdev's, I've always been talking about wanting
> > > to see this supported for all use cases, including physical PCI
> > > devices w/ PASID support.
> >
> > this is not a flow for mdev. It's also required for pdev on Intel platform,
> > because the pasid table is in HPA space thus must be managed by host
> > kernel. Even no translation we still need the user to provide the pasid info.
>
> There should be no mandatory vPASID stuff in most of these flows, that
> is just a special thing ENQCMD virtualization needs. If userspace
> isn't doing ENQCMD virtualization it shouldn't need to touch this
> stuff.

No. for one, we also support SVA w/o using ENQCMD. For two, the key
is that the PASID table cannot be delegated to the userspace like ARM
or AMD. This implies that for any pasid that the userspace wants to
enable, it must be configured via the kernel.

>
> > as explained earlier, on Intel platform the user always needs to provide
> > a PASID in the attaching call. whether it's directly used (for pdev)
> > or translated (for mdev) is the underlying driver thing. From kernel
> > p.o.v, since this PASID is provided by the user, it's fine to call it vPASID
> > in the uAPI.
>
> I've always disagreed with this. There should be an option for the
> kernel to pick an appropriate PASID for portability to other IOMMUs
> and simplicity of the interface.
>
> You need to keep it clear what is in the minimum basic path and what
> is needed for special cases, like ENQCMD virtualization.
>
> Not every user of iommufd is doing virtualization.
>

just for a short summary of PASID model from previous design RFC:

for arm/amd:
- pasid space delegated to userspace
- pasid table delegated to userspace
- just one call to bind pasid_table() then pasids are fully managed by user

for intel:
- pasid table is always managed by kernel
- for pdev,
- pasid space is delegated to userspace
- attach_ioasid(dev, ioasid, pasid) so the kernel can setup the pasid entry
- for mdev,
- pasid space is managed by userspace
- attach_ioasid(dev, ioasid, vpasid). vfio converts vpasid to ppasid. iommufd setups the ppasid entry
- additional a contract to kvm for setup CPU pasid translation if enqcmd is used
- to unify pdev/mdev, just always call it vpasid in attach_ioasid(). let underlying driver to figure out whether vpasid should be translated.

Thanks
Kevin