Re: [RFC] /dev/ioasid uAPI proposal

From: Jason Wang
Date: Wed Jun 02 2021 - 22:53:05 EST



在 2021/6/3 上午4:37, Alex Williamson 写道:
On Wed, 2 Jun 2021 16:54:04 -0300
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

On Wed, Jun 02, 2021 at 01:00:53PM -0600, Alex Williamson wrote:
Right, the device can generate the no-snoop transactions, but it's the
IOMMU that essentially determines whether those transactions are
actually still cache coherent, AIUI.
Wow, this is really confusing stuff in the code.

At the PCI level there is a TLP bit called no-snoop that is platform
specific. The general intention is to allow devices to selectively
bypass the CPU caching for DMAs. GPUs like to use this feature for
performance.
Yes

I assume there is some exciting security issues here. Looks like
allowing cache bypass does something bad inside VMs? Looks like
allowing the VM to use the cache clear instruction that is mandatory
with cache bypass DMA causes some QOS issues? OK.
IIRC, largely a DoS issue if userspace gets to choose when to emulate
wbinvd rather than it being demanded for correct operation.

So how does it work?

What I see in the intel/iommu.c is that some domains support "snoop
control" or not, based on some HW flag. This indicates if the
DMA_PTE_SNP bit is supported on a page by page basis or not.

Since x86 always leans toward "DMA cache coherent" I'm reading some
tea leaves here:

IOMMU_CAP_CACHE_COHERENCY, /* IOMMU can enforce cache coherent DMA
transactions */

And guessing that IOMMUs that implement DMA_PTE_SNP will ignore the
snoop bit in TLPs for IOVA's that have DMA_PTE_SNP set?
That's my understanding as well.

Further, I guess IOMMUs that don't support PTE_SNP, or have
DMA_PTE_SNP clear will always honour the snoop bit. (backwards compat
and all)
Yes.

So, IOMMU_CAP_CACHE_COHERENCY does not mean the IOMMU is DMA
incoherent with the CPU caches, it just means that that snoop bit in
the TLP cannot be enforced. ie the device *could* do no-shoop DMA
if it wants. Devices that never do no-snoop remain DMA coherent on
x86, as they always have been.
Yes, IOMMU_CAP_CACHE_COHERENCY=false means we cannot force the device
DMA to be coherent via the IOMMU.

IOMMU_CACHE does not mean the IOMMU is DMA cache coherent, it means
the PCI device is blocked from using no-snoop in its TLPs.

I wonder if ARM implemented this consistently? I see VDPA is
confused..


Basically, we don't want to bother with pseudo KVM device like what VFIO did. So for simplicity, we rules out the IOMMU that can't enforce coherency in vhost-vDPA if the parent purely depends on the platform IOMMU:


        if (!iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY))
                return -ENOTSUPP;

For the parents that use its own translations logic, an implicit assumption is that the hardware will always perform cache coherent DMA.

Thanks