RE: [PATCH v5 02/15] iommu: Report domain nesting info

From: Liu, Yi L
Date: Mon Jul 20 2020 - 03:20:46 EST


Hi Eric,

> From: Auger Eric <eric.auger@xxxxxxxxxx>
> Sent: Saturday, July 18, 2020 12:29 AM
>
> Hi Yi,
>
> On 7/12/20 1:20 PM, Liu Yi L wrote:
> > IOMMUs that support nesting translation needs report the capability info
> s/needs/need to report

yep.

> > to userspace, e.g. the format of first level/stage paging structures.
> It gives information about requirements the userspace needs to implement
> plus other features characterizing the physical implementation.

got it. will add it in next version.

> >
> > This patch reports nesting info by DOMAIN_ATTR_NESTING. Caller can get
> > nesting info after setting DOMAIN_ATTR_NESTING.
> I guess you meant after selecting VFIO_TYPE1_NESTING_IOMMU?

yes, it is. ok, perhaps, it's better to say get nesting info after selecting
VFIO_TYPE1_NESTING_IOMMU.

> >
> > Cc: Kevin Tian <kevin.tian@xxxxxxxxx>
> > CC: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> > Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > Cc: Eric Auger <eric.auger@xxxxxxxxxx>
> > Cc: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
> > Cc: Joerg Roedel <joro@xxxxxxxxxx>
> > Cc: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
> > Signed-off-by: Liu Yi L <yi.l.liu@xxxxxxxxx>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> > ---
> > v4 -> v5:
> > *) address comments from Eric Auger.
> >
> > v3 -> v4:
> > *) split the SMMU driver changes to be a separate patch
> > *) move the @addr_width and @pasid_bits from vendor specific
> > part to generic part.
> > *) tweak the description for the @features field of struct
> > iommu_nesting_info.
> > *) add description on the @data[] field of struct iommu_nesting_info
> >
> > v2 -> v3:
> > *) remvoe cap/ecap_mask in iommu_nesting_info.
> > *) reuse DOMAIN_ATTR_NESTING to get nesting info.
> > *) return an empty iommu_nesting_info for SMMU drivers per Jean'
> > suggestion.
> > ---
> > include/uapi/linux/iommu.h | 77
> ++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 77 insertions(+)
> >
> > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> > index 1afc661..d2a47c4 100644
> > --- a/include/uapi/linux/iommu.h
> > +++ b/include/uapi/linux/iommu.h
> > @@ -332,4 +332,81 @@ struct iommu_gpasid_bind_data {
> > } vendor;
> > };
> >
> > +/*
> > + * struct iommu_nesting_info - Information for nesting-capable IOMMU.
> > + * user space should check it before using
> > + * nesting capability.
> > + *
> > + * @size: size of the whole structure
> > + * @format: PASID table entry format, the same definition as struct
> > + * iommu_gpasid_bind_data @format.
> > + * @features: supported nesting features.
> > + * @flags: currently reserved for future extension.
> > + * @addr_width: The output addr width of first level/stage translation
> > + * @pasid_bits: Maximum supported PASID bits, 0 represents no PASID
> > + * support.
> > + * @data: vendor specific cap info. data[] structure type can be deduced
> > + * from @format field.
> > + *
> > + *
> +===============+===================================================
> ===+
> > + * | feature | Notes |
> > + *
> +===============+===================================================
> ===+
> > + * | SYSWIDE_PASID | PASIDs are managed in system-wide, instead of per |
> s/in system-wide/system-wide ?

got it.

> > + * | | device. When a device is assigned to userspace or |
> > + * | | VM, proper uAPI (userspace driver framework uAPI, |
> > + * | | e.g. VFIO) must be used to allocate/free PASIDs for |
> > + * | | the assigned device.
> Isn't it possible to be more explicit, something like:
> |
> System-wide PASID management is mandated by the physical IOMMU. All
> PASIDs allocation must be mediated through the TBD API.

yep, I can add it.

> > + * +---------------+------------------------------------------------------+
> > + * | BIND_PGTBL | The owner of the first level/stage page table must |
> > + * | | explicitly bind the page table to associated PASID |
> > + * | | (either the one specified in bind request or the |
> > + * | | default PASID of iommu domain), through userspace |
> > + * | | driver framework uAPI (e.g. VFIO_IOMMU_NESTING_OP). |
> As per your answer in https://lkml.org/lkml/2020/7/6/383, I now
> understand ARM would not expose that BIND_PGTBL nesting feature,

yes, that's my point.

> I still
> think the above wording is a bit confusing. Maybe you may explicitly
> talk about the PASID *entry* that needs to be passed from guest to host.
> On ARM we directly pass the PASID table but when reading the above
> description I fail to determine if this does not fit that description.

yes, I can do it.

> > + * +---------------+------------------------------------------------------+
> > + * | CACHE_INVLD | The owner of the first level/stage page table must |
> > + * | | explicitly invalidate the IOMMU cache through uAPI |
> > + * | | provided by userspace driver framework (e.g. VFIO) |
> > + * | | according to vendor-specific requirement when |
> > + * | | changing the page table. |
> > + * +---------------+------------------------------------------------------+
>
> instead of using the "uAPI provided by userspace driver framework (e.g.
> VFIO)", can't we use the so-called IOMMU UAPI terminology which now has
> a userspace documentation?

the problem is current IOMMU UAPI definitions is actually embedded in
other VFIO UAPI. if it can make the description more clear, I can follow
your suggestion. :-)

>
> > + *
> > + * @data[] types defined for @format:
> > + *
> +================================+==================================
> ===+
> > + * | @format | @data[] |
> > + *
> +================================+==================================
> ===+
> > + * | IOMMU_PASID_FORMAT_INTEL_VTD | struct iommu_nesting_info_vtd |
> > + * +--------------------------------+-------------------------------------+
> > + *
> > + */
> > +struct iommu_nesting_info {
> > + __u32 size;
> shouldn't it be @argsz to fit the iommu uapi convention and take benefit
> to put the flags field just below?

make sense.

> > + __u32 format;
> > +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID (1 << 0)
> > +#define IOMMU_NESTING_FEAT_BIND_PGTBL (1 << 1)
> > +#define IOMMU_NESTING_FEAT_CACHE_INVLD (1 << 2)
> > + __u32 features;
> > + __u32 flags;
> > + __u16 addr_width;
> > + __u16 pasid_bits;
> > + __u32 padding;
> > + __u8 data[];
> > +};
> > +
> > +/*
> > + * struct iommu_nesting_info_vtd - Intel VT-d specific nesting info
> > + *
> > + * @flags: VT-d specific flags. Currently reserved for future
> > + * extension.
> must be set to 0?

yes. will add it.

Thanks,
Yi Liu

> > + * @cap_reg: Describe basic capabilities as defined in VT-d capability
> > + * register.
> > + * @ecap_reg: Describe the extended capabilities as defined in VT-d
> > + * extended capability register.
> > + */
> > +struct iommu_nesting_info_vtd {
> > + __u32 flags;
> > + __u32 padding;
> > + __u64 cap_reg;
> > + __u64 ecap_reg;
> > +};
> > +
> > #endif /* _UAPI_IOMMU_H */
> Thanks
>
> Eric
> >