Re: [PATCH v1 2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent
From: Rafael J. Wysocki
Date: Mon Dec 02 2024 - 16:45:26 EST
On Mon, Dec 2, 2024 at 9:46 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
>
> On Mon, Dec 2, 2024 at 12:16 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
> >
> > On Mon, Dec 2, 2024 at 9:11 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
> > >
> > > Sorry for the delay.
> > >
> > > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > > >
> > > > Locking is not needed to do get_device(dev->parent). We either get a NULL
> > > > (if the parent was cleared) or the actual parent. Also, when a device is
> > > > deleted (device_del()) and removed from the dpm_list, its completion
> > > > variable is also complete_all()-ed. So, we don't have to worry about
> > > > waiting indefinitely on a deleted parent device.
> > >
> > > The device_pm_initialized(dev) check before get_device(dev->parent)
> > > doesn't make sense without the locking and that's the whole point of
> > > it.
> >
> > Hmm, not really.
> >
> > How is the parent prevented from going away in get_device() right
> > after the initial dev check without the locking?
>
> Not sure what you mean by "go away"? But get_device() is going to keep
> a non-zero refcount on the parent so that struct doesn't get freed.
That's after it has returned.
There is nothing to prevent dev from being freed in get_device()
itself before the kobject_get(&dev->kobj) call.
So when get_device() is called, there needs to be an active ref on the
device already or something else to prevent the target device object
from being freed.
In this particular case it is the lock along with the
device_pm_initialized(dev) check on the child.
> The parent itself can still "go away" in terms of unbinding or
> removing the children from the dpm_list(). But that's what the
> device_pm_initialized() check is for. When a device_del() is called,
> it's removed from the dpm_list. The actual freeing comes later. But we
> aren't/weren't checking for that anyway.
>
> >
> > > > Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx>
> > > > ---
> > > > drivers/base/power/main.c | 13 ++-----------
> > > > 1 file changed, 2 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > > > index 86e51b9fefab..9b9b6088e56a 100644
> > > > --- a/drivers/base/power/main.c
> > > > +++ b/drivers/base/power/main.c
> > > > @@ -284,18 +284,9 @@ static bool dpm_wait_for_superior(struct device *dev, bool async)
> > > > * counting the parent once more unless the device has been deleted
> > > > * already (in which case return right away).
> > > > */
> > > > - mutex_lock(&dpm_list_mtx);
> > > > -
> > > > - if (!device_pm_initialized(dev)) {
> > > > - mutex_unlock(&dpm_list_mtx);
> > > > - return false;
> > > > - }
> > > > -
> > > > parent = get_device(dev->parent);
> > > > -
> > > > - mutex_unlock(&dpm_list_mtx);
> > > > -
> > > > - dpm_wait(parent, async);
> > > > + if (device_pm_initialized(dev))
> > > > + dpm_wait(parent, async);
> > >
> > > This is racy, so what's the point?
> > >
> > > You can just do
> > >
> > > parent = get_device(dev->parent);
> > > dpm_wait(parent, async);
>
> Parent struct device being there isn't the same as whether this device
> is in the dpm_list? So, shouldn't we still check this?
>
> Also, is it really racy anymore with the new algorithm? We don't kick
> off the subordinates until after the parent is done.
>
> > >
> > > and please update the comment above this.
>
> Will do.
>
> Thanks,
> Saravana
> > >
> > > > put_device(parent);
> > > >
> > > > dpm_wait_for_suppliers(dev, async);
> > > > --