Re: [PATCH 7/8] vfio/mdev: Fix aborting mdev child device removal if one fails
From: Alex Williamson
Date: Mon Mar 25 2019 - 16:49:43 EST
On Tue, 26 Mar 2019 01:05:34 +0530
Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
> On 3/23/2019 4:50 AM, Parav Pandit wrote:
> > device_for_each_child() stops executing callback function for remaining
> > child devices, if callback hits an error.
> > Each child mdev device is independent of each other.
> > While unregistering parent device, mdev core must remove all child mdev
> > devices.
> > Therefore, mdev_device_remove_cb() always returns success so that
> > device_for_each_child doesn't abort if one child removal hits error.
> >
>
> When unregistering parent device, force_remove is set to true amd
> mdev_device_remove_ops() always returns success.
Can we know that? mdev_device_remove() doesn't guarantee to return
zero.
> > While at it, improve remove and unregister functions for below simplicity.
> >
> > There isn't need to pass forced flag pointer during mdev parent
> > removal which invokes mdev_device_remove().
>
> There is a need to pass the flag, pasting here the comment above
> mdev_device_remove_ops() which explains why the flag is needed:
>
> /*
> * mdev_device_remove_ops gets called from sysfs's 'remove' and when parent
> * device is being unregistered from mdev device framework.
> * - 'force_remove' is set to 'false' when called from sysfs's 'remove'
> which
> * indicates that if the mdev device is active, used by VMM or userspace
> * application, vendor driver could return error then don't remove the
> device.
> * - 'force_remove' is set to 'true' when called from
> mdev_unregister_device()
> * which indicate that parent device is being removed from mdev device
> * framework so remove mdev device forcefully.
> */
I don't see that this changes the force behavior, it's simply noting
that in order to continue the device_for_each_child() iterator, we need
to return zero, regardless of what mdev_device_remove() returns, and
the parent remove path is the only caller of mdev_device_remove_cb(),
so we can assume force = true when calling mdev_device_remove(). Aside
from maybe a WARN_ON if mdev_device_remove() returns non-zero, that
much looks reasonable to me.
> So simplify the flow.
> >
> > mdev_device_remove() is called from two paths.
> > 1. mdev_unregister_driver()
> > mdev_device_remove_cb()
> > mdev_device_remove()
> > 2. remove_store()
> > mdev_device_remove()
> >
> > When device is removed by user using remote_store(), device under
> > removal is mdev device.
> > When device is removed during parent device removal using generic child
> > iterator, mdev check is already done using dev_is_mdev().
> >
> > Hence, remove the unnecessary loop in mdev_device_remove().
I don't think knowing the device type is the only reason for this loop
though. Both paths you mention above can race with each other, so we
need to serialize them and pick a winner. The mdev_list_lock allows us
to do that. Additionally...
> >
> > Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
> > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx>
> > ---
> > drivers/vfio/mdev/mdev_core.c | 24 +++++-------------------
> > 1 file changed, 5 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> > index ab05464..944a058 100644
> > --- a/drivers/vfio/mdev/mdev_core.c
> > +++ b/drivers/vfio/mdev/mdev_core.c
> > @@ -150,10 +150,10 @@ static int mdev_device_remove_ops(struct mdev_device *mdev, bool force_remove)
> >
> > static int mdev_device_remove_cb(struct device *dev, void *data)
> > {
> > - if (!dev_is_mdev(dev))
> > - return 0;
> > + if (dev_is_mdev(dev))
> > + mdev_device_remove(dev, true);
> >
> > - return mdev_device_remove(dev, data ? *(bool *)data : true);
> > + return 0;
> > }
> >
> > /*
> > @@ -241,7 +241,6 @@ int mdev_register_device(struct device *dev, const struct mdev_parent_ops *ops)
> > void mdev_unregister_device(struct device *dev)
> > {
> > struct mdev_parent *parent;
> > - bool force_remove = true;
> >
> > mutex_lock(&parent_list_lock);
> > parent = __find_parent_device(dev);
> > @@ -255,8 +254,7 @@ void mdev_unregister_device(struct device *dev)
> > list_del(&parent->next);
> > class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
> >
> > - device_for_each_child(dev, (void *)&force_remove,
> > - mdev_device_remove_cb);
> > + device_for_each_child(dev, NULL, mdev_device_remove_cb);
> >
> > parent_remove_sysfs_files(parent);
> >
> > @@ -346,24 +344,12 @@ int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid)
> >
> > int mdev_device_remove(struct device *dev, bool force_remove)
> > {
> > - struct mdev_device *mdev, *tmp;
> > + struct mdev_device *mdev;
> > struct mdev_parent *parent;
> > struct mdev_type *type;
> > int ret;
> >
> > mdev = to_mdev_device(dev);
> > -
> > - mutex_lock(&mdev_list_lock);
Acquiring the lock is removed, but...
> > - list_for_each_entry(tmp, &mdev_list, next) {
> > - if (tmp == mdev)
> > - break;
> > - }
> > -
> > - if (tmp != mdev) {
> > - mutex_unlock(&mdev_list_lock);
> > - return -ENODEV;
> > - }
> > -
> > if (!mdev->active) {
> > mutex_unlock(&mdev_list_lock);
> > return -EAGAIN;
> >
We still release it in this path and the code below here. If we don't
find the device on the list under lock, then we're working with a stale
device and playing with the 'active' flag of that device outside of any
sort of mutual exclusion is racy. Thanks,
Alex