RE: [PATCH V10 08/11] iommu/vt-d: Add svm/sva invalidate function

From: Tian, Kevin
Date: Wed Apr 01 2020 - 02:29:44 EST


> From: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> Sent: Wednesday, April 1, 2020 4:58 AM
>
> On Tue, 31 Mar 2020 02:49:21 +0000
> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
>
> > > From: Auger Eric <eric.auger@xxxxxxxxxx>
> > > Sent: Sunday, March 29, 2020 11:34 PM
> > >
> > > Hi,
> > >
> > > On 3/28/20 11:01 AM, Tian, Kevin wrote:
> > > >> From: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> > > >> Sent: Saturday, March 21, 2020 7:28 AM
> > > >>
> > > >> When Shared Virtual Address (SVA) is enabled for a guest OS via
> > > >> vIOMMU, we need to provide invalidation support at IOMMU API
> > > >> and
> > > driver
> > > >> level. This patch adds Intel VT-d specific function to implement
> > > >> iommu passdown invalidate API for shared virtual address.
> > > >>
> > > >> The use case is for supporting caching structure invalidation
> > > >> of assigned SVM capable devices. Emulated IOMMU exposes queue
> > [...]
> > [...]
> > > to
> > > >> + * VT-d granularity. Invalidation is typically included in the
> > > >> unmap
> > > operation
> > > >> + * as a result of DMA or VFIO unmap. However, for assigned
> > > >> devices
> > > guest
> > > >> + * owns the first level page tables. Invalidations of
> > > >> translation caches in
> > > the
> > [...]
> > [...]
> > [...]
> > >
> inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_
> > > >> NR] = {
> > > >> + /*
> > > >> + * PASID based IOTLB invalidation: PASID selective (per
> > > >> PASID),
> > > >> + * page selective (address granularity)
> > > >> + */
> > > >> + {0, 1, 1},
> > > >> + /* PASID based dev TLBs, only support all PASIDs or
> > > >> single PASID */
> > > >> + {1, 1, 0},
> > > >
> > > > Is this combination correct? when single PASID is being
> > > > specified, it is essentially a page-selective invalidation since
> > > > you need provide Address and Size.
> > > Isn't it the same when G=1? Still the addr/size is used. Doesn't
> > > it
> >
> > I thought addr/size is not used when G=1, but it might be wrong. I'm
> > checking with our vt-d spec owner.
> >
>
> > > correspond to IOMMU_INV_GRANU_ADDR with
> > > IOMMU_INV_ADDR_FLAGS_PASID flag
> > > unset?
> > >
> > > so {0, 0, 1}?
> >
> I am not sure I got your logic. The three fields correspond to
> IOMMU_INV_GRANU_DOMAIN, /* domain-selective
> invalidation */
> IOMMU_INV_GRANU_PASID, /* PASID-selective invalidation */
> IOMMU_INV_GRANU_ADDR, /* page-selective invalidation *
>
> For devTLB, we use domain as global since there is no domain. Then I
> came up with {1, 1, 0}, which means we could have global and pasid
> granu invalidation for PASID based devTLB.
>
> If the caller also provide addr and S bit, the flush routine will put

"also" -> "must", because vt-d requires addr/size must be provided
in devtlb descriptor, that is why Eric suggests {0, 0, 1}.

> that into QI descriptor. I know this is a little odd, but from the
> granu translation p.o.v. VT-d spec has no G bit for page selective
> invalidation.

We don't need such odd way if can do it properly. ð

>
> > I have one more open:
> >
> > How does userspace know which invalidation type/gran is supported?
> > I didn't see such capability reporting in Yi's VFIO vSVA patch set.
> > Do we want the user/kernel assume the same capability set if they are
> > architectural? However the kernel could also do some optimization
> > e.g. hide devtlb invalidation capability given that the kernel
> > already invalidate devtlb automatically when serving iotlb
> > invalidation...
> >
> In general, we are trending to use VFIO capability chain to expose iommu
> capabilities.
>
> But for architectural features such as type/granu, we have to assume
> the same capability between host & guest. Granu and types are not
> enumerated on the host IOMMU either.
>
> For devTLB optimization, I agree we need to expose a capability to
> the guest stating that implicit devtlb invalidation is supported.
> Otherwise, if Linux guest runs on other OSes may not support implicit
> devtlb invalidation.
>
> Right Yi?

Thanks for explanation. So we are assumed to support all operations
defined in spec, so no need to expose them one-by-one. For
optimization, I'm fine to do it later.

>
> > Thanks
> > Kevin
> >
> > >
> > > Thanks
> > >
> > > Eric
> > >
> > > >
> > > >> + /* PASID cache */
> > > >
> > > > PASID cache is fully managed by the host. Guest PASID cache
> > > > invalidation is interpreted by vIOMMU for bind and unbind
> > > > operations. I don't think we should accept any PASID cache
> > > > invalidation from userspace or guest.
> > [...]
> > >
> inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU
> > [...]
> > > >
> > > > btw do we really need both map and table here? Can't we just
> > > > use one table with unsupported granularity marked as a special
> > > > value?
> > > >
> > [...]
> > > >
> > > > -ENOTSUPP?
> > > >
> > [...]
> > > >
> > > > granularity == IOMMU_INV_GRANU_ADDR? otherwise it's unclear
> > > > why IOMMU_INV_GRANU_DOMAIN also needs size check.
> > > >
> > [...]
> > > >>> addr_info.addr),
> > [...]
> > [...]
> > > >> + if (info->ats_enabled) {
> > > >> + qi_flush_dev_iotlb_pasid(iommu,
> > > >> sid, info-
> > > >>> pfsid,
> > [...]
> > > >>> pfsid,
> > > >> +
> > > >> inv_info->addr_info.pasid, info->ats_qdep,
> > > >> +
> > > >> inv_info->addr_info.addr, size,
> > > >> + granu);
> > [...]
> > [...]
> > > >>> pasid_info.pasid);
> > > >> +
> > > >
> > > > as earlier comment, we shouldn't allow userspace or guest to
> > > > invalidate PASID cache
> > > >
> > [...]
> > > >
> >
>
> [Jacob Pan]

Thanks
Kevin