Re: [PATCH 2/2] vfio/mdev: don't warn if ->request is not set

From: Alex Williamson
Date: Tue Jul 27 2021 - 15:25:46 EST


On Tue, 27 Jul 2021 16:03:17 -0300
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

> On Tue, Jul 27, 2021 at 12:53:09PM -0600, Alex Williamson wrote:
> > On Tue, 27 Jul 2021 14:32:09 -0300
> > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> >
> > > On Tue, Jul 27, 2021 at 08:04:16AM +0200, Cornelia Huck wrote:
> > > > On Mon, Jul 26 2021, Alex Williamson <alex.williamson@xxxxxxxxxx> wrote:
> > > >
> > > > > On Mon, 26 Jul 2021 20:09:06 -0300
> > > > > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> > > > >
> > > > >> On Mon, Jul 26, 2021 at 07:07:04PM +0200, Cornelia Huck wrote:
> > > > >>
> > > > >> > But I wonder why nobody else implements this? Lack of surprise removal?
> > > > >>
> > > > >> The only implementation triggers an eventfd that seems to be the same
> > > > >> eventfd as the interrupt..
> > > > >>
> > > > >> Do you know how this works in userspace? I'm surprised that the
> > > > >> interrupt eventfd can trigger an observation that the kernel driver
> > > > >> wants to be unplugged?
> > > > >
> > > > > I think we're talking about ccw, but I see QEMU registering separate
> > > > > eventfds for each of the 3 IRQ indexes and the mdev driver specifically
> > > > > triggering the req_trigger...? Thanks,
> > > > >
> > > > > Alex
> > > >
> > > > Exactly, ccw has a trigger for normal I/O interrupts, CRW (machine
> > > > checks), and this one.
> > >
> > > If it is a dedicated eventfd for 'device being removed' why is it in
> > > the CCW implementation and not core code?
> >
> > The CCW implementation (likewise the vfio-pci implementation) owns
> > the IRQ index address space and the decision to make this a signal
> > to userspace rather than perhaps some handling a device might be
> > able to do internally.
>
> The core code holds the vfio_device_get() so long as the FD is
> open. There is no way to pass the wait_for_completion without
> userspace closing the FD, so there isn't really much choice for the
> drivers to do beyond signal to userpace to close the FD??
>
> > For instance an alternate vfio-pci implementation might zap all
> > mmaps, block all r/w access, and turn this into a surprise removal.
>
> This is nice, but wouldn't close the FD, so needs core changes
> anyhow..

Right, the core would need to be able to handle an FD disconnected from
the device, obviously some core changes would be required.

> > Another implementation might be more aggressive to sending SIGKILL
> > to the user process.
>
> We don't try to revoke FDs from the kernel, it is racy, dangerous and
> unreliable.

I'm not sure how trying to kill the process using an open file becomes
a revoke... In fact, the surprise hotplug might just be able to zap
mmaps and wait for userspace to generate a SIGBUS.

> > This was the thought behind why vfio-core triggers the driver
> > request callback with a counter, leaving the policy to the driver.
>
> IMHO subsystem policy does not belong in drivers. Down that road lies
> a mess for userspace.

I think my argument was that to this point it's been driver policy, not
subsystem policy. The subsystem policy is to block until the device is
released, it's the driver policy whether it has a means to implement
something to expedite that. Thanks,

Alex