RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace

From: Liu, Yi L
Date: Wed Apr 01 2020 - 03:38:08 EST


> From: Tian, Kevin <kevin.tian@xxxxxxxxx>
> Sent: Monday, March 30, 2020 7:49 PM
> To: Liu, Yi L <yi.l.liu@xxxxxxxxx>; alex.williamson@xxxxxxxxxx;
> Subject: RE: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to
> userspace
>
> > From: Liu, Yi L <yi.l.liu@xxxxxxxxx>
> > Sent: Sunday, March 22, 2020 8:32 PM
> >
> > From: Liu Yi L <yi.l.liu@xxxxxxxxx>
> >
> > VFIO exposes IOMMU nesting translation (a.k.a dual stage translation)
> > capability to userspace. Thus applications like QEMU could support
> > vIOMMU with hardware's nesting translation capability for pass-through
> > devices. Before setting up nesting translation for pass-through
> > devices, QEMU and other applications need to learn the supported
> > 1st-lvl/stage-1 translation structure format like page table format.
> >
> > Take vSVA (virtual Shared Virtual Addressing) as an example, to
> > support vSVA for pass-through devices, QEMU setup nesting translation
> > for pass- through devices. The guest page table are configured to host
> > as 1st-lvl/
> > stage-1 page table. Therefore, guest format should be compatible with
> > host side.
> >
> > This patch reports the supported 1st-lvl/stage-1 page table format on
> > the current platform to userspace. QEMU and other alike applications
> > should use this format info when trying to setup IOMMU nesting
> > translation on host IOMMU.
> >
> > Cc: Kevin Tian <kevin.tian@xxxxxxxxx>
> > CC: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> > Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > Cc: Eric Auger <eric.auger@xxxxxxxxxx>
> > Cc: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
> > Signed-off-by: Liu Yi L <yi.l.liu@xxxxxxxxx>
> > ---
> > drivers/vfio/vfio_iommu_type1.c | 56
> > +++++++++++++++++++++++++++++++++++++++++
> > include/uapi/linux/vfio.h | 1 +
> > 2 files changed, 57 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c index 9aa2a67..82a9e0b 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct
> > vfio_iommu *iommu,
> > return ret;
> > }
> >
> > +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu,
> > + u32 *stage1_format)
> > +{
> > + struct vfio_domain *domain;
> > + u32 format = 0, tmp_format = 0;
> > + int ret;
> > +
> > + mutex_lock(&iommu->lock);
> > + if (list_empty(&iommu->domain_list)) {
> > + mutex_unlock(&iommu->lock);
> > + return -EINVAL;
> > + }
> > +
> > + list_for_each_entry(domain, &iommu->domain_list, next) {
> > + if (iommu_domain_get_attr(domain->domain,
> > + DOMAIN_ATTR_PASID_FORMAT, &format)) {
> > + ret = -EINVAL;
> > + format = 0;
> > + goto out_unlock;
> > + }
> > + /*
> > + * format is always non-zero (the first format is
> > + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For
> > + * the reason of potential different backed IOMMU
> > + * formats, here we expect to have identical formats
> > + * in the domain list, no mixed formats support.
> > + * return -EINVAL to fail the attempt of setup
> > + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats
> > + * are detected.
> > + */
> > + if (tmp_format && tmp_format != format) {
> > + ret = -EINVAL;
> > + format = 0;
> > + goto out_unlock;
> > + }
> > +
> > + tmp_format = format;
> > + }
>
> this path is invoked only in VFIO_IOMMU_GET_INFO path. If we don't want to
> assume the status quo that one container holds only one device w/ vIOMMU
> (the prerequisite for vSVA), looks we also need check the format
> compatibility when attaching a new group to this container?

right. if attaching to a nesting type container (vfio_iommu.nesting bit
indicates it), it should check if it is compabile with prior domains in
the domain list. But if it is the first one attached to this container,
it's fine. is it good?

> > + ret = 0;
> > +
> > +out_unlock:
> > + if (format)
> > + *stage1_format = format;
> > + mutex_unlock(&iommu->lock);
> > + return ret;
> > +}
> > +
> > static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu,
> > struct vfio_info_cap *caps)
> > {
> > struct vfio_info_cap_header *header;
> > struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
> > + u32 formats = 0;
> > + int ret;
> > +
> > + ret = vfio_iommu_get_stage1_format(iommu, &formats);
> > + if (ret) {
> > + pr_warn("Failed to get stage-1 format\n");
> > + return ret;
> > + }
> >
> > header = vfio_info_cap_add(caps, sizeof(*nesting_cap),
> > VFIO_IOMMU_TYPE1_INFO_CAP_NESTING,
> > 1);
> > @@ -2254,6 +2309,7 @@ static int
> > vfio_iommu_info_add_nesting_cap(struct
> > vfio_iommu *iommu,
> > /* nesting iommu type supports PASID requests (alloc/free) */
> > nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS;
> > }
> > + nesting_cap->stage1_formats = formats;
> >
> > return 0;
> > }
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index ed9881d..ebeaf3e 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -763,6 +763,7 @@ struct vfio_iommu_type1_info_cap_nesting {
> > struct vfio_info_cap_header header;
> > #define VFIO_IOMMU_PASID_REQS (1 << 0)
> > __u32 nesting_capabilities;
> > + __u32 stage1_formats;
>
> do you plan to support multiple formats? If not, use singular name.

I do have such plan. e.g. it may be helpful when one day a platform can
support multiple formats.

Regards,
Yi Liu