Re: [PATCH v5 02/15] iommu: Report domain nesting info

From: Auger Eric
Date: Fri Jul 17 2020 - 12:29:50 EST


Hi Yi,

On 7/12/20 1:20 PM, Liu Yi L wrote:
> IOMMUs that support nesting translation needs report the capability info
s/needs/need to report
> to userspace, e.g. the format of first level/stage paging structures.
It gives information about requirements the userspace needs to implement
plus other features characterizing the physical implementation.
>
> This patch reports nesting info by DOMAIN_ATTR_NESTING. Caller can get
> nesting info after setting DOMAIN_ATTR_NESTING.
I guess you meant after selecting VFIO_TYPE1_NESTING_IOMMU?
>
> Cc: Kevin Tian <kevin.tian@xxxxxxxxx>
> CC: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Cc: Eric Auger <eric.auger@xxxxxxxxxx>
> Cc: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
> Cc: Joerg Roedel <joro@xxxxxxxxxx>
> Cc: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
> Signed-off-by: Liu Yi L <yi.l.liu@xxxxxxxxx>
> Signed-off-by: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> ---
> v4 -> v5:
> *) address comments from Eric Auger.
>
> v3 -> v4:
> *) split the SMMU driver changes to be a separate patch
> *) move the @addr_width and @pasid_bits from vendor specific
> part to generic part.
> *) tweak the description for the @features field of struct
> iommu_nesting_info.
> *) add description on the @data[] field of struct iommu_nesting_info
>
> v2 -> v3:
> *) remvoe cap/ecap_mask in iommu_nesting_info.
> *) reuse DOMAIN_ATTR_NESTING to get nesting info.
> *) return an empty iommu_nesting_info for SMMU drivers per Jean'
> suggestion.
> ---
> include/uapi/linux/iommu.h | 77 ++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 77 insertions(+)
>
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 1afc661..d2a47c4 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -332,4 +332,81 @@ struct iommu_gpasid_bind_data {
> } vendor;
> };
>
> +/*
> + * struct iommu_nesting_info - Information for nesting-capable IOMMU.
> + * user space should check it before using
> + * nesting capability.
> + *
> + * @size: size of the whole structure
> + * @format: PASID table entry format, the same definition as struct
> + * iommu_gpasid_bind_data @format.
> + * @features: supported nesting features.
> + * @flags: currently reserved for future extension.
> + * @addr_width: The output addr width of first level/stage translation
> + * @pasid_bits: Maximum supported PASID bits, 0 represents no PASID
> + * support.
> + * @data: vendor specific cap info. data[] structure type can be deduced
> + * from @format field.
> + *
> + * +===============+======================================================+
> + * | feature | Notes |
> + * +===============+======================================================+
> + * | SYSWIDE_PASID | PASIDs are managed in system-wide, instead of per |
s/in system-wide/system-wide ?
> + * | | device. When a device is assigned to userspace or |
> + * | | VM, proper uAPI (userspace driver framework uAPI, |
> + * | | e.g. VFIO) must be used to allocate/free PASIDs for |
> + * | | the assigned device.
Isn't it possible to be more explicit, something like:
|
System-wide PASID management is mandated by the physical IOMMU. All
PASIDs allocation must be mediated through the TBD API.
> + * +---------------+------------------------------------------------------+
> + * | BIND_PGTBL | The owner of the first level/stage page table must |
> + * | | explicitly bind the page table to associated PASID |
> + * | | (either the one specified in bind request or the |
> + * | | default PASID of iommu domain), through userspace |
> + * | | driver framework uAPI (e.g. VFIO_IOMMU_NESTING_OP). |
As per your answer in https://lkml.org/lkml/2020/7/6/383, I now
understand ARM would not expose that BIND_PGTBL nesting feature, I still
think the above wording is a bit confusing. Maybe you may explicitly
talk about the PASID *entry* that needs to be passed from guest to host.
On ARM we directly pass the PASID table but when reading the above
description I fail to determine if this does not fit that description.
> + * +---------------+------------------------------------------------------+
> + * | CACHE_INVLD | The owner of the first level/stage page table must |
> + * | | explicitly invalidate the IOMMU cache through uAPI |
> + * | | provided by userspace driver framework (e.g. VFIO) |
> + * | | according to vendor-specific requirement when |
> + * | | changing the page table. |
> + * +---------------+------------------------------------------------------+

instead of using the "uAPI provided by userspace driver framework (e.g.
VFIO)", can't we use the so-called IOMMU UAPI terminology which now has
a userspace documentation?

> + *
> + * @data[] types defined for @format:
> + * +================================+=====================================+
> + * | @format | @data[] |
> + * +================================+=====================================+
> + * | IOMMU_PASID_FORMAT_INTEL_VTD | struct iommu_nesting_info_vtd |
> + * +--------------------------------+-------------------------------------+
> + *
> + */
> +struct iommu_nesting_info {
> + __u32 size;
shouldn't it be @argsz to fit the iommu uapi convention and take benefit
to put the flags field just below?
> + __u32 format;
> +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID (1 << 0)
> +#define IOMMU_NESTING_FEAT_BIND_PGTBL (1 << 1)
> +#define IOMMU_NESTING_FEAT_CACHE_INVLD (1 << 2)
> + __u32 features;
> + __u32 flags;
> + __u16 addr_width;
> + __u16 pasid_bits;
> + __u32 padding;
> + __u8 data[];
> +};
> +
> +/*
> + * struct iommu_nesting_info_vtd - Intel VT-d specific nesting info
> + *
> + * @flags: VT-d specific flags. Currently reserved for future
> + * extension.
must be set to 0?
> + * @cap_reg: Describe basic capabilities as defined in VT-d capability
> + * register.
> + * @ecap_reg: Describe the extended capabilities as defined in VT-d
> + * extended capability register.
> + */
> +struct iommu_nesting_info_vtd {
> + __u32 flags;
> + __u32 padding;
> + __u64 cap_reg;
> + __u64 ecap_reg;
> +};
> +
> #endif /* _UAPI_IOMMU_H */
Thanks

Eric
>