Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
From: Jason Gunthorpe
Date: Wed Sep 22 2021 - 08:51:05 EST
On Wed, Sep 22, 2021 at 03:22:42AM +0000, Tian, Kevin wrote:
> > From: Tian, Kevin
> > Sent: Wednesday, September 22, 2021 9:07 AM
> >
> > > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > > Sent: Wednesday, September 22, 2021 8:55 AM
> > >
> > > On Tue, Sep 21, 2021 at 11:56:06PM +0000, Tian, Kevin wrote:
> > > > > The opened atomic is aweful. A newly created fd should start in a
> > > > > state where it has a disabled fops
> > > > >
> > > > > The only thing the disabled fops can do is register the device to the
> > > > > iommu fd. When successfully registered the device gets the normal fops.
> > > > >
> > > > > The registration steps should be done under a normal lock inside the
> > > > > vfio_device. If a vfio_device is already registered then further
> > > > > registration should fail.
> > > > >
> > > > > Getting the device fd via the group fd triggers the same sequence as
> > > > > above.
> > > > >
> > > >
> > > > Above works if the group interface is also connected to iommufd, i.e.
> > > > making vfio type1 as a shim. In this case we can use the registration
> > > > status as the exclusive switch. But if we keep vfio type1 separate as
> > > > today, then a new atomic is still necessary. This all depends on how
> > > > we want to deal with vfio type1 and iommufd, and possibly what's
> > > > discussed here just adds another pound to the shim option...
> > >
> > > No, it works the same either way, the group FD path is identical to
> > > the normal FD path, it just triggers some of the state transitions
> > > automatically internally instead of requiring external ioctls.
> > >
> > > The device FDs starts disabled, an internal API binds it to the iommu
> > > via open coding with the group API, and then the rest of the APIs can
> > > be enabled. Same as today.
> > >
>
> After reading your comments on patch08, I may have a clearer picture
> on your suggestion. The key is to handle exclusive access at the binding
> time (based on vdev->iommu_dev). Please see whether below makes
> sense:
>
> Shared sequence:
>
> 1) initialize the device with a parked fops;
> 2) need binding (explicit or implicit) to move away from parked fops;
> 3) switch to normal fops after successful binding;
>
> 1) happens at device probe.
1 happens when the cdev is setup with the parked fops, yes. I'd say it
happens at fd open time.
> for nongroup 2) and 3) are done together in VFIO_DEVICE_GET_IOMMUFD:
>
> - 2) is done by calling .bind_iommufd() callback;
> - 3) could be done within .bind_iommufd(), or via a new callback e.g.
> .finalize_device(). The latter may be preferred for the group interface;
> - Two threads may open the same device simultaneously, with exclusive
> access guaranteed by iommufd_bind_device();
> - Open() after successful binding is rejected, since normal fops has been
> activated. This is checked upon vdev->iommu_dev;
Almost, open is always successful, what fails is
VFIO_DEVICE_GET_IOMMUFD (or the group equivilant). The user ends up
with a FD that is useless, cannot reach the ops and thus cannot impact
the device it doesn't own in any way.
It is similar to opening a group FD
> for group 2/3) are done together in VFIO_GROUP_GET_DEVICE_FD:
>
> - 2) is done by open coding bind_iommufd + attach_ioas. Create an
> iommufd_device object and record it to vdev->iommu_dev
> - 3) is done by calling .finalize_device();
> - open() after a valid vdev->iommu_dev is rejected. this also ensures
> exclusive ownership with the nongroup path.
Same comment as above, groups should go through the same sequence of
steps, create a FD, attempt to bind, if successuful make the FD
operational.
The only difference is that failure in these steps does not call
fd_install(). For this reason alone the FD could start out with
operational fops, but it feels like a needless optimization.
> If Alex also agrees with it, this might be another mini-series to be merged
> (just for group path) before this one. Doing so sort of nullifies the existing
> group/container attaching process, where attach_ioas will be skipped and
> now the security context is established when the device is opened.
I think it is really important to unify DMA exclusion model and lower
to the core iommu code. If there is a reason the exclusion must be
triggered on group fd open then the iommu core code should provide an
API to do that which interworks with the device API iommufd will work.
But I would start here because it is much simpler to understand..
Jason