RE: [RFC PATCH 0/4] Use 1st-level for DMA remapping in guest

From: Tian, Kevin
Date: Tue Sep 24 2019 - 03:00:28 EST


> From: Raj, Ashok
> Sent: Tuesday, September 24, 2019 4:26 AM
>
> Hi Jacob
>
> On Mon, Sep 23, 2019 at 12:27:15PM -0700, Jacob Pan wrote:
> > >
> > > In VT-d 3.0, scalable mode is introduced, which offers two level
> > > translation page tables and nested translation mode. Regards to
> > > GIOVA support, it can be simplified by 1) moving the GIOVA support
> > > over 1st-level page table to store GIOVA->GPA mapping in vIOMMU,
> > > 2) binding vIOMMU 1st level page table to the pIOMMU, 3) using
> pIOMMU
> > > second level for GPA->HPA translation, and 4) enable nested (a.k.a.
> > > dual stage) translation in host. Compared with current shadow GIOVA
> > > support, the new approach is more secure and software is simplified
> > > as we only need to flush the pIOMMU IOTLB and possible device-IOTLB
> > > when an IOVA mapping in vIOMMU is torn down.
> > >
> > > .-----------.
> > > | vIOMMU |
> > > |-----------| .-----------.
> > > | |IOTLB flush trap | QEMU |
> > > .-----------. (unmap) |-----------|
> > > | GVA->GPA |---------------->| |
> > > '-----------' '-----------'

GVA should be replaced by GIOVA in all the figures.

> > > | | |
> > > '-----------' |
> > > <------------------------------
> > > | VFIO/IOMMU
> > > | cache invalidation and
> > > | guest gpd bind interfaces
> > > v
> > For vSVA, the guest PGD bind interface will mark the PASID as guest
> > PASID and will inject page request into the guest. In FL gIOVA case, I
> > guess we are assuming there is no page fault for GIOVA. I will need to
> > add a flag in the gpgd bind such that any PRS will be auto responded
> > with invalid.
>
> Is there real need to enforce this? I'm not sure if there is any
> limitation in the spec, and if so, can the guest check that instead?

Whether to allow page fault is not usage specific (GIOVA, GVA, etc.).
It's really about the device capability and IOMMU capability. VT-d
allows page fault on both levels. So we don't need enforce it.

btw in the future we may need an interface to tell VFIO whether a
device is 100% DMA-faultable thus pinning can be avoided. But for
now I'm not sure how such knowledge can be retrieved w/o device
specific knowledge. PCI PRI capability only indicates that the device
supports page fault, but not that the device enables page fault on
its every DMA access. Maybe we need a new bit in PRI capability for
such purpose.

>
> Also i believe the idea is to overcommit PASID#0 such uses. Thought
> we had a capability to expose this to the vIOMMU as well. Not sure if this
> is already documented, if not should be up in the next rev.
>
>
> >
> > Also, native use of IOVA FL map is not to be supported? i.e. IOMMU API
> > and DMA API for native usage will continue to be SL only?
> > > .-----------.
> > > | pIOMMU |
> > > |-----------|
> > > .-----------.
> > > | GVA->GPA |<---First level
> > > '-----------'
> > > | GPA->HPA |<---Scond level
>
> s/Scond/Second
>
> > > '-----------'
> > > '-----------'
> > >
> > > This patch series only aims to achieve the first goal, a.k.a using

first goal? then what are other goals? I didn't spot such information.

Also earlier you mentioned the new approach (nested) is more secure
than shadowing. why?

> > > first level translation for IOVA mappings in vIOMMU. I am sending
> > > it out for your comments. Any comments, suggestions and concerns are
> > > welcomed.
> > >
> >
> >
> > > Based-on-idea-by: Ashok Raj <ashok.raj@xxxxxxxxx>
> > > Based-on-idea-by: Kevin Tian <kevin.tian@xxxxxxxxx>
> > > Based-on-idea-by: Liu Yi L <yi.l.liu@xxxxxxxxx>
> > > Based-on-idea-by: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
> > > Based-on-idea-by: Sanjay Kumar <sanjay.k.kumar@xxxxxxxxx>
> > >
> > > Lu Baolu (4):
> > > iommu/vt-d: Move domain_flush_cache helper into header
> > > iommu/vt-d: Add first level page table interfaces
> > > iommu/vt-d: Map/unmap domain with mmmap/mmunmap
> > > iommu/vt-d: Identify domains using first level page table
> > >
> > > drivers/iommu/Makefile | 2 +-
> > > drivers/iommu/intel-iommu.c | 142 ++++++++++--
> > > drivers/iommu/intel-pgtable.c | 342
> > > +++++++++++++++++++++++++++++ include/linux/intel-iommu.h |
> > > 31 ++- include/trace/events/intel_iommu.h | 60 +++++
> > > 5 files changed, 553 insertions(+), 24 deletions(-)
> > > create mode 100644 drivers/iommu/intel-pgtable.c
> > >
> >
> > [Jacob Pan]