Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

From: Jacob Pan
Date: Thu May 13 2021 - 19:38:07 EST


Hi Jason,

On Thu, 13 May 2021 19:31:22 -0300, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

> On Thu, May 13, 2021 at 01:22:51PM -0700, Jacob Pan wrote:
> > Hi Tony,
> >
> > On Thu, 13 May 2021 12:57:49 -0700, "Luck, Tony" <tony.luck@xxxxxxxxx>
> > wrote:
> >
> > > On Thu, May 13, 2021 at 12:46:21PM -0700, Jacob Pan wrote:
> > > > It seems there are two options:
> > > > 1. Add a new IOMMU API to set up a system PASID with a *separate*
> > > > IOMMU page table/domain, mark the device is PASID only with a flag.
> > > > Use DMA APIs to explicit map/unmap. Based on this PASID-only flag,
> > > > Vendor IOMMU driver will decide whether to use system PASID domain
> > > > during map/unmap. Not clear if we also need to make IOVA==kernel VA.
> > > >
> > > > 2. Add a new IOMMU API to setup a system PASID which points to
> > > > init_mm.pgd. This API only allows trusted device to bind with the
> > > > system PASID at its own risk. There is no need for DMA API. This is
> > > > the same as the current code except with an explicit API.
> > > >
> > > > Which option?
> > >
> > > Option #1 looks cleaner to me. Option #2 gives access to bits
> > > of memory that the users of system PASID shouldn't ever need
> > > to touch ... just map regions of memory that the kernel has
> > > a "struct page" for.
> > >
> > > What does "use DMA APIs to explicitly map/unmap" mean? Is that
> > > for the whole region?
> > >
> > If we map the entire kernel direct map during system PASID setup, then
> > we don't need to use DMA API to map/unmap certain range.
> >
> > I was thinking this system PASID page table could be on-demand. The
> > mapping is built by explicit use of DMA map/unmap APIs.
>
> Option 1 should be the PASID works exactly like a normal RID and uses
> all the normal DMA APIs and IOMMU mechanisms, whatever the platform
> implements. This might mean an iommu update on every operation or not.
>
> > > I'm expecting that once this system PASID has been initialized,
> > > then any accelerator device with a kernel use case would use the
> > > same PASID. I.e. DSA for page clearing, IAX for ZSwap compression
> > > & decompression, etc.
> > >
> > OK, sounds like we have to map the entire kernel VA with struct page as
> > you said. So we still by-pass DMA APIs, can we all agree on that?
>
> Option 2 should be the faster option, but not available in all cases.
>
> Option 1 isn't optional. DMA and IOMMU code has to be portable and
> this is the portable API.
>
> If you want to do option 1 and option 2 then give it a go, but in most
> common cases with the IOMMU in a direct map you shouldn't get a
> notable performance win.
>
Looks like we are converging. Let me summarize the takeaways:
1. Remove IOMMU_SVA_BIND_SUPERVISOR flag from this patch, in fact there
will be no flags at all for iommu_sva_bind_device()
2. Remove all supervisor SVA related vt-d, idxd code.
3. Create API iommu_setup_system_pasid_direct_map(option_flag)
if (option_flag == 1)
iommu_domain_alloc(IOMMU_DOMAIN_DMA);
if (option_flag == 2)
iommu_domain_alloc(IOMMU_DOMAIN_DIRECT); //new domain type?
setup IOMMU page tables mirroring the direct map
4. Create API iommu_enable_dev_direct_map(struct dev, &pasid, &option)
- Drivers call this API to get the system PASID and which option is
available on the system PASID
- mark device as PASID only, perhaps a new flag in struct
device->dev_iommu->pasid_only = 1
5. DMA API IOMMU vendor ops will take action based on the pasid_only flag to
decide if the mapping is for system PASID page tables.

Does it make sense?


> Jason


Thanks,

Jacob