Re: [PATCHv1 6/7] vfio/mdev: Fix aborting mdev child device removal if one fails

From: Alex Williamson
Date: Tue Apr 02 2019 - 18:33:15 EST


On Tue, 2 Apr 2019 19:59:58 +0000
Parav Pandit <parav@xxxxxxxxxxxx> wrote:

> > -----Original Message-----
> > From: Cornelia Huck <cohuck@xxxxxxxxxx>
> > Sent: Monday, April 1, 2019 12:39 PM
> > To: Parav Pandit <parav@xxxxxxxxxxxx>
> > Cc: kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > kwankhede@xxxxxxxxxx; alex.williamson@xxxxxxxxxx; cjia@xxxxxxxxxx
> > Subject: Re: [PATCHv1 6/7] vfio/mdev: Fix aborting mdev child device
> > removal if one fails
> >
> > On Tue, 26 Mar 2019 22:45:44 -0500
> > Parav Pandit <parav@xxxxxxxxxxxx> wrote:
> >
> > > device_for_each_child() stops executing callback function for
> > > remaining child devices, if callback hits an error.
> > > Each child mdev device is independent of each other.
> > > While unregistering parent device, mdev core must remove all child
> > > mdev devices.
> > > Therefore, mdev_device_remove_cb() always returns success so that
> >
> > s/always returns/must always return/ ?
> >
> Must always return.
> :-)
>
> > > device_for_each_child doesn't abort if one child removal hits error.
> > >
> > > While at it, improve remove and unregister functions for below simplicity.
> > >
> > > There isn't need to pass forced flag pointer during mdev parent
> > > removal which invokes mdev_device_remove(). So simplify the flow.
> > >
> > > mdev_device_remove() is called from two paths.
> > > 1. mdev_unregister_driver()
> > > mdev_device_remove_cb()
> > > mdev_device_remove()
> > > 2. remove_store()
> > > mdev_device_remove()
> > >
> > > When device is removed by user using remote_store(), device under
> > > removal is mdev device.
> > > When device is removed during parent device removal using generic
> > > child iterator, mdev check is already done using dev_is_mdev().
> >
> > Isn't there still a possible race condition (which you seem to address with
> > the following patch)? IOW, you cannot remove that loop-under-mutex yet?
>
> The loop checks if the remove() is called on the mdev or not.
> This is already checked from both the paths from remove is invoked.
> I didn't remove the 'active' check. So it should be fine.

I believe the loop was actually trying to sanitize the mdev pointer,
for example if it's not in our list of devices we should not even
de-reference 'active'. I think maybe this was more fallout from
allowing remove to fail. For instance, it seems like manipulating
active within the list lock critical section should provide us with
mutual exclusion, the mdev object should be valid until the sysfs
remove attribute is removed, but remove_store() itself removes that
attribute allowing mdev_remove_sysfs_files() to skip over it, but
mdev_remove_device() can fail on the remove_store() path causing it to
recreate the remove attribute. Now we're in trouble because I'm not
sure if recreating the sysfs attribute ever takes a reference to the
device. If it does, it's at least racy. Is it time to put the nail in
the coffin of these remove failure paths? It seems too fundamental to
our code base that drivers cannot do this. Thanks,

Alex

> > >
> > > Hence, remove the unnecessary loop in mdev_device_remove().
> > >
> > > Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
> > > Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
> > > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx>
> > > ---
> > > drivers/vfio/mdev/mdev_core.c | 23 +++++------------------
> > > 1 file changed, 5 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/drivers/vfio/mdev/mdev_core.c
> > > b/drivers/vfio/mdev/mdev_core.c index 836d319..aefcf34 100644
> > > --- a/drivers/vfio/mdev/mdev_core.c
> > > +++ b/drivers/vfio/mdev/mdev_core.c
> > > @@ -149,10 +149,10 @@ static int mdev_device_remove_ops(struct
> > > mdev_device *mdev, bool force_remove)
> > >
> >
> > Maybe add
> >
> > /* only called during parent device unregistration */
> >
> > to avoid headscratching in the future?
> >
> > > static int mdev_device_remove_cb(struct device *dev, void *data) {
> > > - if (!dev_is_mdev(dev))
> > > - return 0;
> > > + if (dev_is_mdev(dev))
> > > + mdev_device_remove(dev, true);
> > >
> > > - return mdev_device_remove(dev, data ? *(bool *)data : true);
> > > + return 0;
> > > }
> > >
> > > /*
> > > @@ -240,7 +240,6 @@ int mdev_register_device(struct device *dev, const
> > > struct mdev_parent_ops *ops) void mdev_unregister_device(struct
> > > device *dev) {
> > > struct mdev_parent *parent;
> > > - bool force_remove = true;
> > >
> > > mutex_lock(&parent_list_lock);
> > > parent = __find_parent_device(dev);
> > > @@ -254,8 +253,7 @@ void mdev_unregister_device(struct device *dev)
> > > list_del(&parent->next);
> > > class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
> > >
> > > - device_for_each_child(dev, (void *)&force_remove,
> > > - mdev_device_remove_cb);
> > > + device_for_each_child(dev, NULL, mdev_device_remove_cb);
> > >
> > > parent_remove_sysfs_files(parent);
> > >
> >
> > Up to this chunk, the patch looks good to me.
> >
> > > @@ -348,24 +346,13 @@ int mdev_device_create(struct kobject *kobj,
> > >
> > > int mdev_device_remove(struct device *dev, bool force_remove) {
> > > - struct mdev_device *mdev, *tmp;
> > > + struct mdev_device *mdev;
> > > struct mdev_parent *parent;
> > > struct mdev_type *type;
> > > int ret;
> > >
> > > mdev = to_mdev_device(dev);
> > > -
> > > mutex_lock(&mdev_list_lock);
> > > - list_for_each_entry(tmp, &mdev_list, next) {
> > > - if (tmp == mdev)
> > > - break;
> > > - }
> > > -
> > > - if (tmp != mdev) {
> > > - mutex_unlock(&mdev_list_lock);
> > > - return -ENODEV;
> > > - }
> > > -
> > > if (!mdev->active) {
> > > mutex_unlock(&mdev_list_lock);
> > > return -EAGAIN;
>