RE: [RFC 10/20] iommu/iommufd: Add IOMMU_DEVICE_GET_INFO

From: Tian, Kevin
Date: Thu Oct 14 2021 - 21:01:49 EST


> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Sent: Thursday, October 14, 2021 11:43 PM
>
> > > > I think the key is whether other archs allow driver to decide DMA
> > > > coherency and indirectly the underlying I/O page table format.
> > > > If yes, then I don't see a reason why such decision should not be
> > > > given to userspace for passthrough case.
> > >
> > > The choice all comes down to if the other arches have cache
> > > maintenance instructions in the VM that *don't work*
> >
> > Looks vfio always sets IOMMU_CACHE on all platforms as long as
> > iommu supports it (true on all platforms except intel iommu which
> > is dedicated for GPU):
> >
> > vfio_iommu_type1_attach_group()
> > {
> > ...
> > if (iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY))
> > domain->prot |= IOMMU_CACHE;
> > ...
> > }
> >
> > Should above be set according to whether a device is coherent?
>
> For IOMMU_CACHE there are two questions related to the overloaded
> meaning:
>
> - Should VFIO ask the IOMMU to use non-coherent DMA (ARM meaning)
> This depends on how the VFIO user expects to operate the DMA.
> If the VFIO user can issue cache maintenance ops then IOMMU_CACHE
> should be controlled by the user. I have no idea what platforms
> support user space cache maintenance ops.

But just like you said for intel meaning below, even if those ops are
privileged a uAPI can be provided to support such usage if necessary.

>
> - Should VFIO ask the IOMMU to suppress no-snoop (Intel meaning)
> This depends if the VFIO user has access to wbinvd or not.
>
> wbinvd is a privileged instruction so normally userspace will not
> be able to access it.
>
> Per Paolo recommendation there should be a uAPI someplace that
> allows userspace to issue wbinvd - basically the suppress no-snoop
> is also user controllable.
>
> The two things are very similar and ultimately are a choice userspace
> should be making.

yes

>
> From something like a qemu perspective things are more murkey - eg on
> ARM qemu needs to co-ordinate with the guest. Whatever IOMMU_CACHE
> mode VFIO is using must match the device coherent flag in the Linux
> guest. I'm guessing all Linux guest VMs only use coherent DMA for all
> devices today. I don't know if the cache maintaince ops are even
> permitted in an ARM VM.
>

I'll leave it to Jean to confirm. If only coherent DMA can be used in
the guest on other platforms, suppose VFIO should not blindly set
IOMMU_CACHE and in concept it should deny assigning a non-coherent
device since no co-ordination with guest exists today.

So the bottomline is that we'll keep this no-snoop thing Intel-specific.
For the basic skeleton we'll not support no-snoop thus the user
needs to set enforce-snoop flag when creating an IOAS like this RFC v1
does. Also need to introduce a new flag instead of abusing
IOMMU_CACHE in the kernel. For other platforms it may need a fix
to deny non-coherent device (based on above open) for now.

Thanks
Kevin