Re: [RFC 0/4] Virtio uses DMA API for all devices

From: Christoph Hellwig
Date: Wed Aug 08 2018 - 08:30:45 EST


On Wed, Aug 08, 2018 at 08:07:49PM +1000, Benjamin Herrenschmidt wrote:
> Qemu virtio bypasses that iommu when the VIRTIO_F_IOMMU_PLATFORM flag
> is not set (default) but there's nothing in the device-tree to tell the
> guest about this since it's a violation of our pseries architecture, so
> we just rely on Linux virtio "knowing" that it happens. It's a bit
> yucky but that's now history...

That is ugly as hell, but it is how virtio works everywhere, so nothing
special so far.

> Essentially pseries "architecturally" does not have the concept of not
> having an iommu in the way and qemu violates that architecture today.
>
> (Remember it comes from pHyp, our priorietary HV, which we are somewhat
> mimmicing here).

It shouldnt be too hard to have a dt property that communicates this,
should it?

> So if we always set VIRTIO_F_IOMMU_PLATFORM, it *will* force all virtio
> through that iommu and performance will suffer (esp vhost I suspect),
> especially since adding/removing translations in the iommu is a
> hypercall.

Well, we'd nee to make sure that for this particular bus we skip the
actualy iommu.

> > It would not be the same effect. The problem with that is that you must
> > now assumes that your qemu knows that for example you might be passing
> > a dma offset if the bus otherwise requires it.
>
> I would assume that arch_virtio_wants_dma_ops() only returns true when
> no such offsets are involved, at least in our case that would be what
> happens.

That would work, but we're really piling hacÄs ontop of hacks here.

> > Or in other words:
> > you potentially break the contract between qemu and the guest of always
> > passing down physical addresses. If we explicitly change that contract
> > through using a flag that says you pass bus address everything is fine.
>
> For us a "bus address" is behind the iommu so that's what
> VIRTIO_F_IOMMU_PLATFORM does already. We don't have the concept of a
> bus address that is different. I suppose it's an ARMism to have DMA
> offsets that are separate from iommus ?

No, a lot of platforms support a bus address that has an offset from
the physical address. including a lot of power platforms:

arch/powerpc/kernel/pci-common.c: set_dma_offset(&dev->dev, PCI_DRAM_OFFSET);
arch/powerpc/platforms/cell/iommu.c: set_dma_offset(dev, cell_dma_nommu_offset);
arch/powerpc/platforms/cell/iommu.c: set_dma_offset(dev, addr);
arch/powerpc/platforms/powernv/pci-ioda.c: set_dma_offset(&pdev->dev, pe->tce_bypass_base);
arch/powerpc/platforms/powernv/pci-ioda.c: set_dma_offset(&pdev->dev, (1ULL << 32));
arch/powerpc/platforms/powernv/pci-ioda.c: set_dma_offset(&dev->dev, pe->tce_bypass_base);
arch/powerpc/platforms/pseries/iommu.c: set_dma_offset(dev, dma_offset);
arch/powerpc/sysdev/dart_iommu.c: set_dma_offset(&dev->dev, DART_U4_BYPASS_BASE);
arch/powerpc/sysdev/fsl_pci.c: set_dma_offset(dev, pci64_dma_offset);

to make things worse some platforms (at least on arm/arm64/mips/x86) can
also require additional banking where it isn't even a single linear map
but multiples windows.