Re: [RFC 03/20] vfio: Add vfio_[un]register_device()

From: Jason Gunthorpe
Date: Wed Sep 29 2021 - 08:22:37 EST


On Wed, Sep 29, 2021 at 12:46:14PM +1000, david@xxxxxxxxxxxxxxxxxxxxx wrote:
> On Tue, Sep 21, 2021 at 10:00:14PM -0300, Jason Gunthorpe wrote:
> > On Wed, Sep 22, 2021 at 12:54:02AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > > > Sent: Wednesday, September 22, 2021 12:01 AM
> > > >
> > > > > One open about how to organize the device nodes under
> > > > /dev/vfio/devices/.
> > > > > This RFC adopts a simple policy by keeping a flat layout with mixed
> > > > devname
> > > > > from all kinds of devices. The prerequisite of this model is that devnames
> > > > > from different bus types are unique formats:
> > > >
> > > > This isn't reliable, the devname should just be vfio0, vfio1, etc
> > > >
> > > > The userspace can learn the correct major/minor by inspecting the
> > > > sysfs.
> > > >
> > > > This whole concept should disappear into the prior patch that adds the
> > > > struct device in the first place, and I think most of the code here
> > > > can be deleted once the struct device is used properly.
> > > >
> > >
> > > Can you help elaborate above flow? This is one area where we need
> > > more guidance.
> > >
> > > When Qemu accepts an option "-device vfio-pci,host=DDDD:BB:DD.F",
> > > how does Qemu identify which vifo0/1/... is associated with the specified
> > > DDDD:BB:DD.F?
> >
> > When done properly in the kernel the file:
> >
> > /sys/bus/pci/devices/DDDD:BB:DD.F/vfio/vfioX/dev
> >
> > Will contain the major:minor of the VFIO device.
> >
> > Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat
> > that the major:minor matches.
> >
> > in the above pattern "pci" and "DDDD:BB:DD.FF" are the arguments passed
> > to qemu.
>
> I thought part of the appeal of the device centric model was less
> grovelling around in sysfs for information. Using type/address
> directly in /dev seems simpler than having to dig around matching
> things here.

I would say more regular grovelling. Starting from a sysfs device
directory and querying the VFIO cdev associated with it is much more
normal than what happens today, which also includes passing sysfs
information into an ioctl :\

> Note that this doesn't have to be done in kernel: you could have the
> kernel just call them /dev/vfio/devices/vfio0, ... but add udev rules
> that create symlinks from say /dev/vfio/pci/DDDD:BB:SS.F - >
> ../devices/vfioXX based on the sysfs information.

This is the right approach if people want to do this, but I'm not sure
it is worth it given backwards compat requires the sysfs path as
input. We may as well stick with sysfs as the command line interface
for userspace tools.

And I certainly don't want to see userspace tools trying to reverse a
sysfs path into a /dev/ symlink name when they can directly and
reliably learn the correct cdev from the sysfspath.

Jason