Re: [PATCH 1/2] PM / genpd: Stop/start devices without pm_runtime_force_suspend/resume()
From: Ulf Hansson
Date: Fri Jan 12 2018 - 09:07:43 EST
On 12 January 2018 at 14:10, Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>
> There are problems with calling pm_runtime_force_suspend/resume()
> to "stop" and "start" devices in genpd_finish_suspend() and
> genpd_resume_noirq() (and in analogous hibernation-specific genpd
> callbacks) after commit 122a22377a3d (PM / Domains: Stop/start
> devices during system PM suspend/resume in genpd) as those routines
> do much more than just "stopping" and "starting" devices (which was
> the stated purpose of that commit) unnecessarily and may not play
> well with system-wide PM driver callbacks.
>
> First, consider the pm_runtime_force_suspend() in
> genpd_finish_suspend(). If the current runtime PM status of the
> device is "suspended", that function most likely does the right thing
> by ignoring the device, because it should have been "stopped" already
> and whatever needed to be done to deactivate it shoud have been done.
> In turn, if the runtime PM status of the device is "active",
> genpd_runtime_suspend() is called for it (indirectly) and (1) runs
> the ->runtime_suspend callback provided by the device's driver
> (assuming no bus type with ->runtime_suspend of its own), (2) "stops"
> the device and (3) checks if the domain can be powered down, and then
> (4) the device's runtime PM status is changed to "suspended". Out of
> the four actions above (1) is not necessary and it may be outright
> harmful, (3) is pointless and (4) is questionable. The only
> operation that needs to be carried out here is (2).
>
> The reason why (1) is not necessary is because the system-wide
> PM callbacks provided by the device driver for the transition in
> question have been run and they should have taken care of the
> driver's part of device suspend already. Moreover, it may be
> harmful, because the ->runtime_suspend callback may want to
> access the device which is partially suspended at that point
> and may not be responsive. Also, system-wide PM callbacks may
> have been run already (in the previous phases of the system
> transition under way) for the device's parent or for its supplier
> devices (if any) and the device may not be accessible because of
> that.
>
> There also is no reason to do (3), because genpd_finish_suspend()
> will repeat it anyway, and (4) potentially causes confusion to ensue
> during the subsequent system transition to the working state.
>
> Consider pm_runtime_force_resume() in genpd_resume_noirq() now.
> It runs genpd_runtime_resume() for all devices with runtime PM
> status set to "suspended", which includes all of the devices
> whose runtime PM status was changed by pm_runtime_force_suspend()
> before and may include some devices already suspended when the
> pm_runtime_force_suspend() was running, which may be confusing. The
> genpd_runtime_resume() first tries to power up the domain, which
> (again) is pointless, because genpd_resume_noirq() has done that
> already. Then, it "starts" the device and runs the ->runtime_resume
> callback (from the driver, say) for it. If all is well, the device
> is left with the runtime PM status set to "active".
>
> Unfortunately, running the driver's ->runtime_resume callback
> before its system-wide PM callbacks and possibly before some
> system-wide PM callbacks of the parent device's driver (let
> alone supplier drivers) is asking for trouble, especially if
> the device had been suspended before pm_runtime_force_suspend()
> ran previously or if the callbacks in question expect to be run
> back-to-back with their suspend-side counterparts. It also should
> not be necessary, because the system-wide PM driver callbacks that
> will be invoked for the device subsequently should take care of
> resuming it just fine.
>
> [Running the driver's ->runtime_resume callback in the "noirq"
> phase of the transition to the working state may be problematic
> even for devices whose drivers do use pm_runtime_force_resume()
> in (or as) their system-wide PM callbacks if they have suppliers
> other than their parents, because it may cause the supplier to
> be resumed after the consumer in some cases.]
>
> Because of the above, modify genpd as follows:
>
> 1. Change genpd_finish_suspend() to only "stop" devices with
> runtime PM status set to "active" (without invoking runtime PM
> callbacks for them, changing their runtime PM status and so on).
>
> That doesn't change the handling of devices whose drivers use
> pm_runtime_force_suspend/resume() in (or as) their system-wide
> PM callbacks and addresses the issues described above for the
> other devices.
>
> 2. Change genpd_resume_noirq() to only "start" devices with
> runtime PM status set to "active" (without invoking runtime PM
> callbacks for them, changing their runtime PM status and so on).
>
> Again, that doesn't change the handling of devices whose drivers
> use pm_runtime_force_suspend/resume() in (or as) their system-wide
> PM callbacks and addresses the described issues for the other
> devices. Devices with runtime PM status set to "suspended"
> are not started with the assumption that they will be resumed
> later, either by pm_runtime_force_resume() or via runtime PM.
>
> 3. Change genpd_restore_noirq() to follow genpd_resume_noirq().
>
> That causes devices already suspended before hibernation to be
> left alone (which also is the case without the change) and
> avoids running the ->runtime_resume driver callback too early
> for the other devices.
>
> 4. Change genpd_freeze_noirq() and genpd_thaw_noirq() in accordance
> with the above modifications.
>
> Fixes: 122a22377a3d (PM / Domains: Stop/start devices during system PM suspend/resume in genpd)
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
That was an extensive changlog, thanks for the details and for working on this!
Acked-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
Kind regards
Uffe
> ---
> drivers/base/power/domain.c | 25 +++++++++++++++----------
> 1 file changed, 15 insertions(+), 10 deletions(-)
>
> Index: linux-pm/drivers/base/power/domain.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/domain.c
> +++ linux-pm/drivers/base/power/domain.c
> @@ -1048,8 +1048,9 @@ static int genpd_finish_suspend(struct d
> if (dev->power.wakeup_path && genpd_is_active_wakeup(genpd))
> return 0;
>
> - if (genpd->dev_ops.stop && genpd->dev_ops.start) {
> - ret = pm_runtime_force_suspend(dev);
> + if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> + !pm_runtime_status_suspended(dev)) {
> + ret = genpd_stop_dev(genpd, dev);
> if (ret) {
> if (poweroff)
> pm_generic_restore_noirq(dev);
> @@ -1106,8 +1107,9 @@ static int genpd_resume_noirq(struct dev
> genpd->suspended_count--;
> genpd_unlock(genpd);
>
> - if (genpd->dev_ops.stop && genpd->dev_ops.start) {
> - ret = pm_runtime_force_resume(dev);
> + if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> + !pm_runtime_status_suspended(dev)) {
> + ret = genpd_start_dev(genpd, dev);
> if (ret)
> return ret;
> }
> @@ -1139,8 +1141,9 @@ static int genpd_freeze_noirq(struct dev
> if (ret)
> return ret;
>
> - if (genpd->dev_ops.stop && genpd->dev_ops.start)
> - ret = pm_runtime_force_suspend(dev);
> + if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> + !pm_runtime_status_suspended(dev))
> + ret = genpd_stop_dev(genpd, dev);
>
> return ret;
> }
> @@ -1163,8 +1166,9 @@ static int genpd_thaw_noirq(struct devic
> if (IS_ERR(genpd))
> return -EINVAL;
>
> - if (genpd->dev_ops.stop && genpd->dev_ops.start) {
> - ret = pm_runtime_force_resume(dev);
> + if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> + !pm_runtime_status_suspended(dev)) {
> + ret = genpd_start_dev(genpd, dev);
> if (ret)
> return ret;
> }
> @@ -1221,8 +1225,9 @@ static int genpd_restore_noirq(struct de
> genpd_sync_power_on(genpd, true, 0);
> genpd_unlock(genpd);
>
> - if (genpd->dev_ops.stop && genpd->dev_ops.start) {
> - ret = pm_runtime_force_resume(dev);
> + if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> + !pm_runtime_status_suspended(dev)) {
> + ret = genpd_start_dev(genpd, dev);
> if (ret)
> return ret;
> }
>