RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

From: Tian, Kevin
Date: Tue Sep 21 2021 - 23:22:59 EST


> From: Tian, Kevin
> Sent: Wednesday, September 22, 2021 9:07 AM
>
> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Sent: Wednesday, September 22, 2021 8:55 AM
> >
> > On Tue, Sep 21, 2021 at 11:56:06PM +0000, Tian, Kevin wrote:
> > > > The opened atomic is aweful. A newly created fd should start in a
> > > > state where it has a disabled fops
> > > >
> > > > The only thing the disabled fops can do is register the device to the
> > > > iommu fd. When successfully registered the device gets the normal fops.
> > > >
> > > > The registration steps should be done under a normal lock inside the
> > > > vfio_device. If a vfio_device is already registered then further
> > > > registration should fail.
> > > >
> > > > Getting the device fd via the group fd triggers the same sequence as
> > > > above.
> > > >
> > >
> > > Above works if the group interface is also connected to iommufd, i.e.
> > > making vfio type1 as a shim. In this case we can use the registration
> > > status as the exclusive switch. But if we keep vfio type1 separate as
> > > today, then a new atomic is still necessary. This all depends on how
> > > we want to deal with vfio type1 and iommufd, and possibly what's
> > > discussed here just adds another pound to the shim option...
> >
> > No, it works the same either way, the group FD path is identical to
> > the normal FD path, it just triggers some of the state transitions
> > automatically internally instead of requiring external ioctls.
> >
> > The device FDs starts disabled, an internal API binds it to the iommu
> > via open coding with the group API, and then the rest of the APIs can
> > be enabled. Same as today.
> >

After reading your comments on patch08, I may have a clearer picture
on your suggestion. The key is to handle exclusive access at the binding
time (based on vdev->iommu_dev). Please see whether below makes
sense:

Shared sequence:

1) initialize the device with a parked fops;
2) need binding (explicit or implicit) to move away from parked fops;
3) switch to normal fops after successful binding;

1) happens at device probe.

for nongroup 2) and 3) are done together in VFIO_DEVICE_GET_IOMMUFD:

- 2) is done by calling .bind_iommufd() callback;
- 3) could be done within .bind_iommufd(), or via a new callback e.g.
.finalize_device(). The latter may be preferred for the group interface;
- Two threads may open the same device simultaneously, with exclusive
access guaranteed by iommufd_bind_device();
- Open() after successful binding is rejected, since normal fops has been
activated. This is checked upon vdev->iommu_dev;

for group 2/3) are done together in VFIO_GROUP_GET_DEVICE_FD:

- 2) is done by open coding bind_iommufd + attach_ioas. Create an
iommufd_device object and record it to vdev->iommu_dev
- 3) is done by calling .finalize_device();
- open() after a valid vdev->iommu_dev is rejected. this also ensures
exclusive ownership with the nongroup path.

If Alex also agrees with it, this might be another mini-series to be merged
(just for group path) before this one. Doing so sort of nullifies the existing
group/container attaching process, where attach_ioas will be skipped and
now the security context is established when the device is opened.

Thanks
Kevin