Re: [ANNOUNCE] VFIO V6 & public VFIO repositories

From: Benjamin Herrenschmidt
Date: Tue Dec 21 2010 - 16:44:39 EST


On Tue, 2010-12-21 at 11:48 -0800, Tom Lyon wrote:
>
> Ben - I don't have any good news for you.
>
> DMA remappers like on Power and Sparc have been around forever, the new thing
> about Intel/AMD iommus is the per-device address spaces and the protection
> inherent in having separate mappings for each device. If one is to trust a
> user level app or virtual machine to program DMA registers directly, then you
> really need per device translation.

Right, and we had that for a while too on our PCIe variants :-)

IE. We have a single address space, -but- that address space is divided
into windows that have an individual filter on the transaction requester
IDs (which I can configure to filter a full bus, a full device, or
pretty much per function). I have a pile of such windows (depending on
the exact chipset, up to 256 today).

So essentially, each device -does- have separate mappings, tho those are
limited to a "window" of the address space which is typically going to
be around 256M (or smaller) in 32-bit space (but can be much larger in
64-bit space depending on how much physically contiguous space we can
spare for the translation table itself).

Now, it doesn't do multi-level translations. So KVM guests (or userspace
applications) will not directly modify the translation table. That does
mean map/unmap "ioctls" for userspace. In the KVM case, hypercalls.

This is not a huge deal for us right now as our operating environment is
already paravirtualized (for running under pHyp aka PowerVM aka IBM
proprietary hypervisor). So we just implement the same hypercalls in KVM
and existing kernels will "just work". Not as efficient as direct access
into a multi level page table but still better than nothing :-)

> That said, early versions of VFIO had a mapping mode that used the normal DMA
> API instead of the iommu/uiommu api and assumed that the user was trusted, but
> that wasn't interesting for the long term.
>
> So if you want safe device assigment you're going to need hardware help.

Well, there are going to be some amount of changes in future HW but
that's not something we can count on today and we have to support
existing machines. That said, as I wrote above, I -do- have per-device
assignment, however, I don't get to give an entire 64-bit address space
to each of them, only a "window" in a single address space, so I need
somewhat to convey those boundaries to userspace.

There's also a mismatch with the concept of creating an iommu domain,
and then attaching devices to it (which kvm intends to exploit, Alex was
explaining that his plan is to put all devices in a partition inside the
same domain). In our case, the domains are pretty-much pre-existing and
tied to each device. But this is more an API mismatch specific to
uiommu.

Cheers,
Ben.

>
> > >
> > > Cheers,

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/