Re: [PATCH 4/5] clk: Get runtime PM before walking tree during disable_unused

From: Stephen Boyd
Date: Mon Mar 25 2024 - 18:18:21 EST


Quoting Doug Anderson (2024-03-25 09:19:37)
> Hi,
>
> On Sun, Mar 24, 2024 at 10:44 PM Stephen Boyd <sboyd@xxxxxxxxxx> wrote:
> >
> > Introduce a list of clk_core structures that have been registered, or
> > are in the process of being registered, that require runtime PM to
> > operate. Iterate this list and call clk_pm_runtime_get() on each of them
> > without holding the prepare_lock during clk_disable_unused(). This way
> > we can be certain that the runtime PM state of the devices will be
> > active and resumed so we can't schedule away while walking the clk tree
> > with the prepare_lock held. Similarly, call clk_pm_runtime_put() without
> > the prepare_lock held to properly drop the runtime PM reference.
>
> There's a part of me that worries about the fact that we'll now be
> doing a pm_runtime get() on _all clocks_ (even those that are used) at
> bootup now. I worry that some device out there will be unhappy about
> it. ...but I guess the device passed in here is already documented to
> be one that the clock framework can get/put whenever it needs to
> prepare the clock, so that makes me feel like it should be fine.
>
> Anyway, no action item, just documenting my thoughts...
>
> Oh, funny. After reading the next patch, I guess I'm even less
> concerned. I guess we were already grabbing the pm_runtime state for
> all clocks while printing the clock summary. While that's a debugfs
> function, it's still something that many people have likely exercised
> and it's likely not going to introduce random/long tail problems.
>
>
> > +/*
> > + * Call clk_pm_runtime_get() on all runtime PM enabled clks in the clk tree so
> > + * that disabling unused clks avoids a deadlock where a device is runtime PM
> > + * resuming/suspending and the runtime PM callback is trying to grab the
> > + * prepare_lock for something like clk_prepare_enable() while
> > + * clk_disable_unused_subtree() holds the prepare_lock and is trying to runtime
> > + * PM resume/suspend the device as well.
> > + */
> > +static int clk_pm_runtime_get_all(void)
>
> nit: It'd be nice if this documented that it acquired / held the lock.
> Could be in comments, or, might as well use the syntax like this (I
> think):
>
> __acquires(&clk_rpm_list_lock);
>
> ...similar with the put function.

I had that but removed it because on the error path we drop the lock and
sparse complains. I don't know how to signal that the lock is held
unless an error happens, but I'm a little out of date on sparse now.

>
>
> > + /*
> > + * Runtime PM "get" all the devices that are needed for the clks
> > + * currently registered. Do this without holding the prepare_lock, to
> > + * avoid the deadlock.
> > + */
> > + hlist_for_each_entry(core, &clk_rpm_list, rpm_node) {
> > + ret = clk_pm_runtime_get(core);
> > + if (ret) {
> > + failed = core;
> > + pr_err("clk: Failed to runtime PM get '%s' for clk '%s'\n",
> > + failed->name, dev_name(failed->dev));
>
> If I'm reading this correctly, the strings are backward in your error
> print. Right now you're printing:
>
> clk: Failed to runtime PM get '<clk_name>' for clk '<dev_name>'

Good catch. Thanks!

>
> With the printout fixed and some type of documentation that
> clk_pm_runtime_get_all() and clk_pm_runtime_put_all() grab/release the
> mutex:
>
> Reviewed-by: Douglas Anderson <dianders@xxxxxxxxxxxx>