Re: [PATCH v5 0/9] Use 1st-level for IOVA translation

From: Lu Baolu
Date: Wed Jan 01 2020 - 21:33:18 EST


Hi Yi,

On 1/2/20 10:31 AM, Liu, Yi L wrote:
From: Lu Baolu [mailto:baolu.lu@xxxxxxxxxxxxxxx]
Sent: Thursday, January 2, 2020 7:38 AM
To: Joerg Roedel <joro@xxxxxxxxxx>; David Woodhouse <dwmw2@xxxxxxxxxxxxx>;
Alex Williamson <alex.williamson@xxxxxxxxxx>
Subject: Re: [PATCH v5 0/9] Use 1st-level for IOVA translation

On 12/24/19 3:44 PM, Lu Baolu wrote:
Intel VT-d in scalable mode supports two types of page tables for DMA
translation: the first level page table and the second level page
table. The first level page table uses the same format as the CPU page
table, while the second level page table keeps compatible with
previous formats. The software is able to choose any one of them for
DMA remapping according to the use case.

This patchset aims to move IOVA (I/O Virtual Address) translation to
1st-level page table in scalable mode. This will simplify vIOMMU
(IOMMU simulated by VM hypervisor) design by using the two-stage
translation, a.k.a. nested mode translation.

As Intel VT-d architecture offers caching mode, guest IOVA (GIOVA)
support is currently implemented in a shadow page manner. The device
simulation software, like QEMU, has to figure out GIOVA->GPA mappings
and write them to a shadowed page table, which will be used by the
physical IOMMU. Each time when mappings are created or destroyed in
vIOMMU, the simulation software has to intervene. Hence, the changes
on GIOVA->GPA could be shadowed to host.


.-----------.
| vIOMMU |
|-----------| .--------------------.
| |IOTLB flush trap | QEMU |
.-----------. (map/unmap) |--------------------|
|GIOVA->GPA |---------------->| .------------. |
'-----------' | | GIOVA->HPA | |
| | | '------------' |
'-----------' | |
| |
'--------------------'
|
<------------------------------------
|
v VFIO/IOMMU API
.-----------.
| pIOMMU |
|-----------|
| |
.-----------.
|GIOVA->HPA |
'-----------'
| |
'-----------'

In VT-d 3.0, scalable mode is introduced, which offers two-level
translation page tables and nested translation mode. Regards to GIOVA
support, it can be simplified by 1) moving the GIOVA support over
1st-level page table to store GIOVA->GPA mapping in vIOMMU,
2) binding vIOMMU 1st level page table to the pIOMMU, 3) using pIOMMU
second level for GPA->HPA translation, and 4) enable nested (a.k.a.
dual-stage) translation in host. Compared with current shadow GIOVA
support, the new approach makes the vIOMMU design simpler and more
efficient as we only need to flush the pIOMMU IOTLB and possible
device-IOTLB when an IOVA mapping in vIOMMU is torn down.

.-----------.
| vIOMMU |
|-----------| .-----------.
| |IOTLB flush trap | QEMU |
.-----------. (unmap) |-----------|
|GIOVA->GPA |---------------->| |
'-----------' '-----------'
| | |
'-----------' |
<------------------------------
| VFIO/IOMMU
| cache invalidation and
| guest gpd bind interfaces
v
.-----------.
| pIOMMU |
|-----------|
.-----------.
|GIOVA->GPA |<---First level
'-----------'
| GPA->HPA |<---Scond level
'-----------'
'-----------'

This patch applies the first level page table for IOVA translation
unless the DOMAIN_ATTR_NESTING domain attribution has been set.
Setting of this attribution means the second level will be used to map
gPA (guest physical address) to hPA (host physical address), and the
mappings between gVA (guest virtual address) and gPA will be
maintained by the guest with the page table address binding to host's
first level.

Based-on-idea-by: Ashok Raj<ashok.raj@xxxxxxxxx>
Based-on-idea-by: Kevin Tian<kevin.tian@xxxxxxxxx>
Based-on-idea-by: Liu Yi L<yi.l.liu@xxxxxxxxx>
Based-on-idea-by: Jacob Pan<jacob.jun.pan@xxxxxxxxxxxxxxx>
Based-on-idea-by: Sanjay Kumar<sanjay.k.kumar@xxxxxxxxx>
Based-on-idea-by: Lu Baolu<baolu.lu@xxxxxxxxxxxxxxx>

Queued all patches for v5.6.

Reviewed-by: Liu Yi L <yi.l.liu@xxxxxxxxx>

Aha, looks like I forgot to give my Reviewed-by after offline review..
Yeah, this patchset looks good to me.

Thank you, Yi. Very appreciated for your time.

Best regards,
-baolu