RE: [PATCH v2 1/3] iommu/uapi: Define uapi version and capabilities

From: Tian, Kevin
Date: Wed Apr 01 2020 - 01:39:04 EST


> From: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> Sent: Tuesday, March 31, 2020 11:55 PM
>
> On Tue, 31 Mar 2020 06:06:38 +0000
> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
>
> > > From: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> > > Sent: Tuesday, March 31, 2020 12:08 AM
> > >
> > > On Mon, 30 Mar 2020 05:40:40 +0000
> > > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> > >
> > > > > From: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> > > > > Sent: Saturday, March 28, 2020 7:54 AM
> > > > >
> > > > > On Fri, 27 Mar 2020 00:47:02 -0700
> > > > > Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> > > > >
> > > > > > On Fri, Mar 27, 2020 at 02:49:55AM +0000, Tian, Kevin wrote:
> > > > > > > If those API calls are inter-dependent for composing a
> > > > > > > feature (e.g. SVA), shouldn't we need a way to check them
> > > > > > > together before exposing the feature to the guest, e.g.
> > > > > > > through a iommu_get_uapi_capabilities interface?
> > > > > >
> > > > > > Yes, that makes sense. The important bit is to have a
> > > > > > capability flags and not version numbers.
> > > > >
> > > > > The challenge is that there are two consumers in the kernel for
> > > > > this. 1. VFIO only look for compatibility, and size of each data
> > > > > struct such that it can copy_from_user.
> > > > >
> > > > > 2. IOMMU driver, the "real consumer" of the content.
> > > > >
> > > > > For 2, I agree and we do plan to use the capability flags to
> > > > > check content and maintain backward compatibility etc.
> > > > >
> > > > > For VFIO, it is difficult to do size look up based on capability
> > > > > flags.
> > > >
> > > > Can you elaborate the difficulty in VFIO? if, as Christoph Hellwig
> > > > pointed out, version number is already avoided everywhere, it is
> > > > interesting to know whether this work becomes a real exception
> > > > or just requires a different mindset.
> > > >
> > > From VFIO p.o.v. the IOMMU UAPI data is opaque, it only needs to do
> > > two things:
> > > 1. is the UAPI compatible?
> > > 2. what is the size to copy?
> > >
> > > If you look at the version number, this is really a "version as
> > > size" lookup, as provided by the helper function in this patch. An
> > > example can be the newly introduced clone3 syscall.
> > > https://lwn.net/Articles/792628/
> > > In clone3, new version must have new size. The slight difference
> > > here is that, unlike clone3, we have multiple data structures
> > > instead of a single struct clone_args {}. And each struct has flags
> > > to enumerate its contents besides size.
> >
> > Thanks for providing that link. However clone3 doesn't include a
> > version field to do "version as size" lookup. Instead, as you said,
> > it includes a size parameter which sounds like the option 3 (argsz)
> > listed below.
> >
> Right, there is no version in clone3. size = version. I view this as
> a 1:1 lookup.
>
> > >
> > > Besides breaching data abstraction, if VFIO has to check IOMMU
> > > flags to determine the sizes, it has many combinations.
> > >
> > > We also separate the responsibilities into two parts
> > > 1. compatibility - version, size by VFIO
> > > 2. sanity check - capability flags - by IOMMU
> >
> > I feel argsz+flags approach can perfectly meet above requirement. The
> > userspace set the size and flags for whatever capabilities it uses,
> > and VFIO simply copies the parameters by size and pass to IOMMU for
> > further sanity check. Of course the assumption is that we do provide
> > an interface for userspace to enumerate all supported capabilities.
> >
> You cannot trust user for argsz. the size to be copied from user must
> be based on knowledge in kernel. That is why we have this version to
> size lookup.
>
> In VFIO, the size to copy is based on knowledge of each VFIO UAPI
> structures and VFIO flags. But here the flags are IOMMU UAPI flags. As
> you pointed out in another thread, VFIO is one user.

If that is the case, can we let VFIO only copy its own UAPI fields while
simply passing the user pointer of IOMMU UAPI structure to IOMMU
driver for further size check and copy? Otherwise we are entering a
dead end that VFIO doesn't want to parse a structure which is not
defined by him while using version to represent the black box size
is considered as a discarded scheme and doesn't scale well...

>
> > Is there anything that I overlooked here? I suppose there might be
> > some difficulties that block you from going the argsz way...
> >
> > Thanks
> > Kevin
> >
> > >
> > > I think the latter matches what Christoph's comments. So we are in
> > > agreement at the IOMMU level :)
> > >
> > > For example:
> > > During guest PASID bind, IOMMU driver operates on the data passed
> > > from VFIO and check format & flags to take different code path.
> > >
> > > #define IOMMU_PASID_FORMAT_INTEL_VTD 1
> > > __u32 format;
> > > #define IOMMU_SVA_GPASID_VAL (1 << 0) /* guest PASID valid */
> > > __u64 flags;
> > >
> > > Jacob
> > >
> > > > btw the most relevant discussion which I can find out now is here:
> > > > https://lkml.org/lkml/2020/2/3/1126
> > > >
> > > > It mentioned 3 options for handling extension:
> > > > --
> > > > 1. Disallow adding new members to each structure other than reuse
> > > > padding bits or adding union members at the end.
> > > > 2. Allow extension of the structures beyond union, but union size
> > > > has to be fixed with reserved spaces
> > > > 3. Adopt VFIO argsz scheme, I don't think we need version for each
> > > > struct anymore. argsz implies the version that user is using
> > > > assuming UAPI data is extension only.
> > > > --
> > > >
> > > > the first two are both version-based. Looks most guys agreed with
> > > > option-1 (in this v2), but Alex didn't give his opinion at the
> > > > moment. The last response from him was the raise of option-3 using
> > > > argsz to avoid version. So, we also need hear from him. Alex?
> > > >
> > > > Thanks
> > > > Kevin
> > >
> > > [Jacob Pan]
>
> [Jacob Pan]

Thanks
Kevin