Re: [PATCH v2 1/3] docs: IOMMU user API

From: Alex Williamson
Date: Fri Jun 19 2020 - 12:38:24 EST


On Fri, 19 Jun 2020 03:30:24 +0000
"Liu, Yi L" <yi.l.liu@xxxxxxxxx> wrote:

> Hi Alex,
>
> > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > Sent: Friday, June 19, 2020 10:55 AM
> >
> > On Fri, 19 Jun 2020 02:15:36 +0000
> > "Liu, Yi L" <yi.l.liu@xxxxxxxxx> wrote:
> >
> > > Hi Alex,
> > >
> > > > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > > Sent: Friday, June 19, 2020 5:48 AM
> > > >
> > > > On Wed, 17 Jun 2020 08:28:24 +0000
> > > > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> > > >
> > > > > > From: Liu, Yi L <yi.l.liu@xxxxxxxxx>
> > > > > > Sent: Wednesday, June 17, 2020 2:20 PM
> > > > > >
> > > > > > > From: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> > > > > > > Sent: Tuesday, June 16, 2020 11:22 PM
> > > > > > >
> > > > > > > On Thu, 11 Jun 2020 17:27:27 -0700 Jacob Pan
> > > > > > > <jacob.jun.pan@xxxxxxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > > >
> > > > > > > > > But then I thought it even better if VFIO leaves the
> > > > > > > > > entire
> > > > > > > > > copy_from_user() to the layer consuming it.
> > > > > > > > >
> > > > > > > > OK. Sounds good, that was what Kevin suggested also. I just
> > > > > > > > wasn't sure how much VFIO wants to inspect, I thought VFIO
> > > > > > > > layer wanted to do a sanity check.
> > > > > > > >
> > > > > > > > Anyway, I will move copy_from_user to iommu uapi layer.
> > > > > > >
> > > > > > > Just one more point brought up by Yi when we discuss this offline.
> > > > > > >
> > > > > > > If we move copy_from_user to iommu uapi layer, then there will
> > > > > > > be
> > > > > > multiple
> > > > > > > copy_from_user calls for the same data when a VFIO container
> > > > > > > has
> > > > > > multiple domains,
> > > > > > > devices. For bind, it might be OK. But might be additional
> > > > > > > overhead for TLB
> > > > > > flush
> > > > > > > request from the guest.
> > > > > >
> > > > > > I think it is the same with bind and TLB flush path. will be
> > > > > > multiple copy_from_user.
> > > > >
> > > > > multiple copies is possibly fine. In reality we allow only one
> > > > > group per nesting container (as described in patch [03/15]), and
> > > > > usually there is just one SVA-capable device per group.
> > > > >
> > > > > >
> > > > > > BTW. for moving data copy to iommy layer, there is another point
> > > > > > which need to consider. VFIO needs to do unbind in bind path if
> > > > > > bind failed, so it will assemble unbind_data and pass to iommu
> > > > > > layer. If iommu layer do the copy_from_user, I think it will be failed. any
> > idea?
> > > >
> > > > If a call into a UAPI fails, there should be nothing to undo.
> > > > Creating a partial setup for a failed call that needs to be undone
> > > > by the caller is not good practice.
> > >
> > > is it still a problem if it's the VFIO to undo the partial setup
> > > before returning to user space?
> >
> > Yes. If a UAPI function fails there should be no residual effect.
>
> ok. the iommu_sva_bind_gpasid() is per device call. There is no residual
> effect if it failed. so no partial setup will happen per device.
>
> but VFIO needs to use iommu_group_for_each_dev() to do bind, so
> if iommu_group_for_each_dev() failed, I guess VFIO needs to undo
> the partial setup for the group. right?

Yes, each individual call should make no changes if it fails, but the
caller would need to unwind separate calls. If this introduces too
much knowledge to the caller for the group case, maybe there should be
a group-level function in the iommu code to handle that. Thanks,

Alex

> > > > > This might be mitigated if we go back to use the same bind_data
> > > > > for both bind/unbind. Then you can reuse the user object for unwinding.
> > > > >
> > > > > However there is another case where VFIO may need to assemble the
> > > > > bind_data itself. When a VM is killed, VFIO needs to walk
> > > > > allocated PASIDs and unbind them one-by-one. In such case
> > > > > copy_from_user doesn't work since the data is created by kernel.
> > > > > Alex, do you have a suggestion how this usage can be supported?
> > > > > e.g. asking IOMMU driver to provide two sets of APIs to handle user/kernel
> > generated requests?
> > > >
> > > > Yes, it seems like vfio would need to make use of a driver API to do
> > > > this, we shouldn't be faking a user buffer in the kernel in order to
> > > > call through to a UAPI. Thanks,
> > >
> > > ok, so if VFIO wants to issue unbind by itself, it should use an API
> > > which passes kernel buffer to IOMMU layer. If the unbind request is
> > > from user space, then VFIO should use another API which passes user
> > > buffer pointer to IOMMU layer. makes sense. will align with jacob.
> >
> > Sounds right to me. Different approaches might be used for the driver API versus
> > the UAPI, perhaps there is no buffer. Thanks,
>
> thanks for your coaching. It may require Jacob to add APIs in iommu layer
> for the two purposes.
>
> Regards,
> Yi Liu
>
> > Alex
>