Re: [RFC 10/20] iommu/iommufd: Add IOMMU_DEVICE_GET_INFO

From: Jason Gunthorpe
Date: Thu Sep 23 2021 - 08:22:28 EST


On Thu, Sep 23, 2021 at 12:05:29PM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Sent: Thursday, September 23, 2021 7:27 PM
> >
> > On Thu, Sep 23, 2021 at 11:15:24AM +0100, Jean-Philippe Brucker wrote:
> >
> > > So we can only tell userspace "No_snoop is not supported" (provided we
> > > even want to allow them to enable No_snoop). Users in control of stage-1
> > > tables can create non-cacheable mappings through MAIR attributes.
> >
> > My point is that ARM is using IOMMU_CACHE to control the overall
> > cachability of the DMA
> >
> > ie not specifying IOMMU_CACHE requires using the arch specific DMA
> > cache flushers.
> >
> > Intel never uses arch specifc DMA cache flushers, and instead is
> > abusing IOMMU_CACHE to mean IOMMU_BLOCK_NO_SNOOP on DMA that
> > is always
> > cachable.
>
> it uses IOMMU_CACHE to force all DMAs to snoop, including those which
> has non_snoop flag and wouldn't snoop cache if iommu is disabled. Nothing
> is blocked.

I see it differently, on Intel the only way to bypass the cache with
DMA is to specify the no-snoop bit in the TLP. The IOMMU PTE flag we
are talking about tells the IOMMU to ignore the no snoop bit.

Again, Intel arch in the kernel does not support the DMA cache flush
arch API and *DOES NOT* support incoherent DMA at all.

ARM *does* implement the DMA cache flush arch API and is using
IOMMU_CACHE to control if the caller will, or will not call the cache
flushes.

This is fundamentally different from what Intel is using it for.

> but why do you call it abuse? IOMMU_CACHE was first introduced for
> Intel platform:

IMHO ARM changed the meaning when Robin linked IOMMU_CACHE to
dma_is_coherent stuff. At that point it became linked to 'do I need to
call arch cache flushers or not'.

> > These are different things and need different bits. Since the ARM path
> > has a lot more code supporting it, I'd suggest Intel should change
> > their code to use IOMMU_BLOCK_NO_SNOOP and abandon IOMMU_CACHE.
>
> I didn't fully get this point. The end result is same, i.e. making the DMA
> cache-coherent when IOMMU_CACHE is set. Or if you help define the
> behavior of IOMMU_CACHE, what will you define now?

It is clearly specifying how the kernel API works:

!IOMMU_CACHE
must call arch cache flushers
IOMMU_CACHE -
do not call arch cache flushers
IOMMU_CACHE|IOMMU_BLOCK_NO_SNOOP -
dot not arch cache flushers, and ignore the no snoop bit.

On Intel it should refuse to create a !IOMMU_CACHE since the HW can't
do that. All IOMMU formats can support IOMMU_CACHE. Only the special
no-snoop IOPTE format can support the final one, and it is only useful
for iommufd/vfio users that are interacting with VMs and wbvind.

Jason