Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

From: Jike Song
Date: Mon Nov 23 2015 - 00:06:26 EST


On 11/21/2015 01:25 AM, Alex Williamson wrote:
On Fri, 2015-11-20 at 08:10 +0000, Tian, Kevin wrote:

Here is a more concrete example:

KVMGT doesn't require IOMMU. All DMA targets are already replaced with
HPA thru shadow GTT. So DMA requests from GPU all contain HPAs.

When IOMMU is enabled, one simple approach is to have vGPU IOMMU
driver configure system IOMMU with identity mapping (HPA->HPA). We
can't use (GPA->HPA) since GPAs from multiple VMs are conflicting.

However, we still have host gfx driver running. When IOMMU is enabled,
dma_alloc_*** will return IOVA (drvers/iommu/iova.c) in host gfx driver,
which will have IOVA->HPA programmed to system IOMMU.

One IOMMU device entry can only translate one address space, so here
comes a conflict (HPA->HPA vs. IOVA->HPA). To solve this, vGPU IOMMU
driver needs to allocate IOVA from iova.c for each VM w/ vGPU assigned,
and then KVMGT will program IOVA in shadow GTT accordingly. It adds
one additional mapping layer (GPA->IOVA->HPA). In this way two
requirements can be unified together since only IOVA->HPA mapping
needs to be built.

So unlike existing type1 IOMMU driver which controls IOMMU alone, vGPU
IOMMU driver needs to cooperate with other agent (iova.c here) to
co-manage system IOMMU. This may not impact existing VFIO framework.
Just want to highlight additional work here when implementing the vGPU
IOMMU driver.

Right, so the existing i915 driver needs to use the DMA API and calls
like dma_map_page() to enable translations through the IOMMU. With
dma_map_page(), the caller provides a page address (~HPA) and is
returned an IOVA. So unfortunately you don't get to take the shortcut
of having an identity mapping through the IOMMU unless you want to
convert i915 entirely to using the IOMMU API, because we also can't have
the conflict that an HPA could overlap an IOVA for a previously mapped
page.

The double translation, once through the GPU MMU and once through the
system IOMMU is going to happen regardless of whether we can identity
map through the IOMMU. The only solution to this would be for the GPU
to participate in ATS and provide pre-translated transactions from the
GPU. All of this is internal to the i915 driver (or vfio extension of
that driver) and needs to be done regardless of what sort of interface
we're using to expose the vGPU to QEMU. It just seems like VFIO
provides a convenient way of doing this since you'll have ready access
to the HVA-GPA mappings for the user.

I think the key points though are:

* the VFIO type1 IOMMU stores GPA to HVA translations
* get_user_pages() on the HVA will pin the page and give you a
page
* dma_map_page() receives that page, programs the system IOMMU and
provides an IOVA
* the GPU MMU can then be programmed with the GPA to IOVA
translations

Thanks for such a nice example! I'll do my home work and get back to you
shortly :)


Thanks,
Alex


--
Thanks,
Jike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/