RE: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

From: Parav Pandit
Date: Tue Mar 26 2019 - 01:53:32 EST




> -----Original Message-----
> From: linux-kernel-owner@xxxxxxxxxxxxxxx <linux-kernel-
> owner@xxxxxxxxxxxxxxx> On Behalf Of Parav Pandit
> Sent: Monday, March 25, 2019 10:19 PM
> To: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Cc: kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> kwankhede@xxxxxxxxxx
> Subject: RE: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
>
>
>
> > -----Original Message-----
> > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > Sent: Monday, March 25, 2019 9:17 PM
> > To: Parav Pandit <parav@xxxxxxxxxxxx>
> > Cc: kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > kwankhede@xxxxxxxxxx
> > Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> >
> > On Tue, 26 Mar 2019 01:43:44 +0000
> > Parav Pandit <parav@xxxxxxxxxxxx> wrote:
> >
> > > > -----Original Message-----
> > > > From: Alex Williamson <alex.williamson@xxxxxxxxxx>

> > > > I mean the callback iterator on the parent remove can do a WARN_ON
> > > > if this returns an error while the device remove path can silently
> > > > return -EBUSY, the common function doesn't need to decide whether
> > > > the parent ops remove function deserves a dev_err.
> > > >
> > > Ok. I understood.
> > > But device remove returning silent -EBUSY looks an error that should
> > > get logged in, because this is something not expected. Its probably
> > > late for sysfs layer to return report an error by that time it
> > > prints device name, because put_device() is done. So if remove()
> > > returns an error, I think its legitimate failure to do WARN_ON or
> dev_err().
> >
> > Calling put_device() if the parent remove op fails looks like a bug
> > introduced by this series, the current code allows that failure
> > leaving the device in a coherent state and returning errno to the sysfs
> store function.
> >
> Why should it fail?
> We are taking off the device bus first as describe in commit log.
> This ensures that everything is closed before calling the remove().
> We cannot avoid put_device() and put_parent, it all buggy path...

I audited remove() callbacks of kvmgt.c, vfio_ccw_ops.c, vfio_ap_ops.c, mbochs.c, mdpy.c, mtty.c, who makes the remove possible once the device release is executed.
This should complete once the device is taken off the bus.
This was not the case before this sequence where remove() is done while device is open...hence the check was needed in past.
dev_err() is to help catch any errors/bugs in this area.

I doubt we need to retry remove() like vfio_del_group_dev(), in mdev_core if release() is not yet complete.