Re: [PATCH v8 09/26] kernel/cpu_pm: Manage runtime PM in the idle path for CPUs

From: Rafael J. Wysocki
Date: Wed Jul 18 2018 - 06:12:52 EST


On Wednesday, June 20, 2018 7:22:09 PM CEST Ulf Hansson wrote:
> To allow CPUs being power managed by PM domains, let's deploy support for
> runtime PM for the CPU's corresponding struct device.
>
> More precisely, at the point when the CPU is about to enter an idle state,
> decrease the runtime PM usage count for its corresponding struct device,
> via calling pm_runtime_put_sync_suspend(). Then, at the point when the CPU
> resumes from idle, let's increase the runtime PM usage count, via calling
> pm_runtime_get_sync().
>
> Cc: Lina Iyer <ilina@xxxxxxxxxxxxxx>
> Co-developed-by: Lina Iyer <lina.iyer@xxxxxxxxxx>
> Signed-off-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>

I finally got to this one, sorry for the huge delay.

Let me confirm that I understand the code flow correctly.

> ---
> kernel/cpu_pm.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/kernel/cpu_pm.c b/kernel/cpu_pm.c
> index 67b02e138a47..492d4a83dca0 100644
> --- a/kernel/cpu_pm.c
> +++ b/kernel/cpu_pm.c
> @@ -16,9 +16,11 @@
> */
>
> #include <linux/kernel.h>
> +#include <linux/cpu.h>
> #include <linux/cpu_pm.h>
> #include <linux/module.h>
> #include <linux/notifier.h>
> +#include <linux/pm_runtime.h>
> #include <linux/spinlock.h>
> #include <linux/syscore_ops.h>
>
> @@ -91,6 +93,7 @@ int cpu_pm_enter(void)

This is called from a cpuidle driver's ->enter callback for the target state
selected by the idle governor ->

> {
> int nr_calls;
> int ret = 0;
> + struct device *dev = get_cpu_device(smp_processor_id());
>
> ret = cpu_pm_notify(CPU_PM_ENTER, -1, &nr_calls);
> if (ret)
> @@ -100,6 +103,9 @@ int cpu_pm_enter(void)
> */
> cpu_pm_notify(CPU_PM_ENTER_FAILED, nr_calls - 1, NULL);
>
> + if (!ret && dev && dev->pm_domain)
> + pm_runtime_put_sync_suspend(dev);

-> so this is going to invoke genpd_runtime_suspend() if the usage
counter of dev is 0.

That will cause cpu_power_down_ok() to be called (because this is
a CPU domain) and that will walk the domain cpumask and compute the
estimated idle duration as the minimum of tick_nohz_get_next_wakeup()
values over the CPUs in that cpumask. [Note that the weight of the
cpumask must be seriously limited for that to actually work, as this
happens in the idle path.] Next, it will return "true" if it can
find a domain state with residency within the estimated idle
duration. [Note that this sort of overlaps with the idle governor's
job.]

Next, __genpd_runtime_suspend() will be invoked to run the device-specific
callback if any [Note that this has to be suitable for the idle path if
present.] and genpd_stop_dev() runs (which, again, may invoke a callback)
and genpd_power_off() runs under the domain lock (which must be a spinlock
then).

> +
> return ret;
> }
> EXPORT_SYMBOL_GPL(cpu_pm_enter);
> @@ -118,6 +124,11 @@ EXPORT_SYMBOL_GPL(cpu_pm_enter);
> */
> int cpu_pm_exit(void)
> {
> + struct device *dev = get_cpu_device(smp_processor_id());
> +
> + if (dev && dev->pm_domain)
> + pm_runtime_get_sync(dev);
> +
> return cpu_pm_notify(CPU_PM_EXIT, -1, NULL);
> }
> EXPORT_SYMBOL_GPL(cpu_pm_exit);
>

And this is called on wakeup when the cpuidle driver's ->enter callback
is about to return and it reverses the suspend flow (except that the
governor doesn't need to be called now).

Have I got that right?