Re: [Resend][PATCH] PM: Introduce core framework for run-time PM ofI/O devices (rev. 11)

From: Alan Stern
Date: Tue Aug 04 2009 - 16:33:48 EST


On Mon, 3 Aug 2009, Rafael J. Wysocki wrote:

> Hi,
>
> OK, if this is to go into 2.6.32, the last moment for putting it into
> linux-next is now. If you have any objections, remarks, etc. please let me
> know or I'm going to put this one into the linux-next branch of the suspend-2.6
> tree in the next couple of days.

I'm sorry I haven't been keeping on top of all your work on this.
Lots of other stuff has been going on in the meantime...

One the whole this all looks very good. It's basically ready to be
merged. There are a couple of minor issues remaining plus a bunch of
unimportant implementation details.

pm_runtime_disable() gets used for several different purposes. For
the usage in pm_runtime_remove(), it's silly to carry out a pending
resume request. Should we add an argument saying whether or not to do
so?

In the documentation, it would be nice to have a section listing the
default runtime PM settings and explaining what a driver should do to
activate runtime PM on a newly-registered device.


> +static void pm_runtime_cancel_pending(struct device *dev)
> +{
> + pm_runtime_deactivate_timer(dev);
> + /*
> + * If there's a request pending, make sure its work function will return
> + * without doing anything.
> + */
> + if (dev->power.request_pending)
> + dev->power.request = RPM_REQ_NONE;

No need for the "if"; you can always do the assignment.


> +static int __pm_runtime_idle(struct device *dev)
> +{
> + int retval = 0;
> +
> + if (dev->power.runtime_failure)
> + retval = -EINVAL;

Instead of assigning to retval, you could simply return these values.

> + else if (dev->power.idle_notification)
> + retval = -EINPROGRESS;
> + else if (atomic_read(&dev->power.usage_count) > 0
> + || dev->power.disable_depth > 0
> + || dev->power.timer_expires > 0
> + || dev->power.runtime_status == RPM_SUSPENDED
> + || dev->power.runtime_status == RPM_SUSPENDING)
> + retval = -EAGAIN;

Do we also want to rule out RPM_RESUMING? That is,

|| dev->power.runtime_status != RPM_ACTIVE

> + spin_unlock_irq(&dev->power.lock);
> +
> + if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_idle)
> + dev->bus->pm->runtime_idle(dev);
> +
> + spin_lock_irq(&dev->power.lock);

Small optimization: Put the spin_{un}lock_irq stuff inside the "if"
statement, so it doesn't happen if the test fails. The same thing can
be done in other places.


> + * __pm_runtime_suspend - Carry out run-time suspend of given device.
> + * @dev: Device to suspend.
> + * @from_wq: If set, the funtion has been called via pm_wq.

Fix spelling of "function". Likewise in __pm_runtime_resume.

> +int __pm_runtime_suspend(struct device *dev, bool from_wq)
> +{
...
> + spin_unlock_irq(&dev->power.lock);
> +
> + if (parent && !parent->power.ignore_children)
> + pm_request_idle(parent);
> +
> + if (notify)
> + pm_runtime_idle(dev);

Move this up before the spin_unlock_irq and call __pm_runtime_idle instead.
The same sort of thing can be done in __pm_runtime_resume.


> +static int __pm_request_suspend(struct device *dev)
> +{
> + int retval = 0;
> +
> + if (dev->power.runtime_failure)
> + return -EINVAL;
> +
> + if (dev->power.runtime_status == RPM_SUSPENDED)
> + retval = 1;
> + else if (atomic_read(&dev->power.usage_count) > 0
> + || dev->power.disable_depth > 0)
> + retval = -EAGAIN;
> + else if (dev->power.runtime_status == RPM_SUSPENDING)
> + retval = -EINPROGRESS;
> + else if (!pm_children_suspended(dev))
> + retval = -EBUSY;

Insert:
if (retval)
return retval;

Or else change the assignments to "return" statements. Yes, we agreed
that a suspend request should override an existing idle-notify
request. But if the new request fails then it shouldn't override
anything. (Of course, if it fails for any of the reasons here then
there can't be a pending idle-notify request anyway.)

> + pm_runtime_deactivate_timer(dev);
> +
> + if (dev->power.request_pending) {
> + /*
> + * Pending resume requests take precedence over us, but we can
> + * overtake any other pending request.
> + */
> + if (dev->power.request == RPM_REQ_RESUME)
> + retval = -EAGAIN;
> + else if (dev->power.request != RPM_REQ_SUSPEND)
> + dev->power.request = retval ?
> + RPM_REQ_NONE : RPM_REQ_SUSPEND;

Now there's no need to check retval.

> +
> + if (dev->power.request == RPM_REQ_SUSPEND)
> + return 0;

Just simply:
return retval;

Some of these cases can't happen. For instance, if we reach here then
the status can't be SUSPENDED or SUSPENDING, so there can't be a
pending resume request.

> + }
> +
> + if (retval)
> + return retval;

Now this isn't needed. Similar code rearrangements can be made in
__pm_request_resume.


> +int __pm_request_resume(struct device *dev)

Should be static.


> +int __pm_runtime_set_status(struct device *dev, unsigned int status)
> +{
> + struct device *parent = dev->parent;
> + unsigned long flags;
> + int error = 0;
> +
> + if (status != RPM_ACTIVE && status != RPM_SUSPENDED)
> + return -EINVAL;

This should go inside the spinlocked area.

> + spin_lock_irqsave(&dev->power.lock, flags);
> +
> + if (!dev->power.runtime_failure && !dev->power.disable_depth)
> + goto out;
> +
> + if (dev->power.runtime_status == status)
> + goto out_clear;
> +
> + if (status == RPM_SUSPENDED) {
> + /* It always is possible to set the status to 'suspended'. */
> + if (parent)
> + atomic_add_unless(&parent->power.child_count, -1, 0);
> + dev->power.runtime_status = status;
> + goto out_clear;
> + }
> +
> + if (parent) {
> + spin_lock_irq(&parent->power.lock);
> +
> + /*
> + * It is invalid to put an active child under a parent that is
> + * not active, has run-time PM enabled and the
> + * 'power.ignore_children' flag unset.
> + */
> + if (!parent->power.disable_depth
> + && !parent->power.ignore_children
> + && parent->power.runtime_status != RPM_ACTIVE) {
> + error = -EBUSY;
> + } else {
> + if (dev->power.runtime_status == RPM_SUSPENDED)
> + atomic_inc(&parent->power.child_count);
> + dev->power.runtime_status = status;
> + }
> +
> + spin_unlock_irq(&parent->power.lock);
> +
> + if (error)
> + goto out;
> + } else {
> + dev->power.runtime_status = status;
> + }
> +
> + out_clear:
> + dev->power.runtime_failure = false;

Move all those assignments to dev->power.runtime_status down here.


> +int pm_runtime_disable(struct device *dev)
> +{
...
> + if (dev->power.disable_depth++ > 0)
> + goto out;
> +
> + if (dev->power.runtime_failure)
> + goto out;

I don't see why this is needed.

> +
> + pm_runtime_deactivate_timer(dev);
> +
> + if (dev->power.request_pending) {
> + dev->power.request = RPM_REQ_NONE;
> +
> + spin_unlock_irq(&dev->power.lock);
> +
> + cancel_work_sync(&dev->power.work);
> +
> + spin_lock_irq(&dev->power.lock);
> +
> + dev->power.request_pending = false;

Remove excessive whitespace.

> + }
> +
> + if (dev->power.runtime_status == RPM_SUSPENDING
> + || dev->power.runtime_status == RPM_RESUMING) {
> + DEFINE_WAIT(wait);
> +
> + /* Suspend or wake-up in progress. */
> + for (;;) {
> + prepare_to_wait(&dev->power.wait_queue, &wait,
> + TASK_UNINTERRUPTIBLE);
> + if (dev->power.runtime_status != RPM_SUSPENDING
> + && dev->power.runtime_status != RPM_RESUMING)
> + break;
> +
> + spin_unlock_irq(&dev->power.lock);
> +
> + schedule();
> +
> + spin_lock_irq(&dev->power.lock);
> + }
> + finish_wait(&dev->power.wait_queue, &wait);
> + }
> +
> + if (dev->power.runtime_failure)
> + goto out;
> +
> + if (dev->power.idle_notification) {
> + DEFINE_WAIT(wait);
> +
> + for (;;) {
> + prepare_to_wait(&dev->power.wait_queue, &wait,
> + TASK_UNINTERRUPTIBLE);
> + if (!dev->power.idle_notification)
> + break;
> +
> + spin_unlock_irq(&dev->power.lock);
> +
> + schedule();
> +
> + spin_lock_irq(&dev->power.lock);
> + }
> + finish_wait(&dev->power.wait_queue, &wait);
> + }

This wait loop should be merged with the previous loop.


> +void pm_runtime_init(struct device *dev)
> +{
> + spin_lock_init(&dev->power.lock);
> +
> + dev->power.runtime_status = RPM_SUSPENDED;
> + dev->power.idle_notification = false;
> +
> + dev->power.disable_depth = 1;
> + atomic_set(&dev->power.usage_count, 0);
> +
> + dev->power.runtime_failure = false;
> + dev->power.last_error = 0;

You don't have to set values to 0; they are initialized by kzalloc.

> + dev->power.suspend_timer.expires = jiffies;
> + dev->power.suspend_timer.data = (unsigned long)dev;
> + dev->power.suspend_timer.function = pm_suspend_timer_fn;

Use setup_timer() instead of assigning these fields directly.


> +void pm_runtime_remove(struct device *dev)
> +{
> + pm_runtime_disable(dev);
> +
> + if (dev->power.runtime_status == RPM_ACTIVE) {
> + struct device *parent = dev->parent;
> +
> + /*
> + * Change the status back to 'suspended' to match the initial
> + * status.
> + */
> + pm_runtime_set_suspended(dev);
> + if (parent && !parent->power.ignore_children)
> + pm_request_idle(parent);

Shouldn't these last two lines be part of __pm_runtime_set_status()?


> --- /dev/null
> +++ linux-2.6/include/linux/pm_runtime.h

> +extern void pm_runtime_init(struct device *dev);
> +extern void pm_runtime_remove(struct device *dev);

I don't like seeing these two functions included in the public header
file. It's enough to put them in drivers/base/power/power.h.


> --- linux-2.6.orig/drivers/base/power/main.c
> +++ linux-2.6/drivers/base/power/main.c
> @@ -49,6 +50,16 @@ static DEFINE_MUTEX(dpm_list_mtx);
> static bool transition_started;
>
> /**
> + * device_pm_init - Initialize the PM-related part of a device object
> + * @dev: Device object to initialize.
> + */
> +void device_pm_init(struct device *dev)
> +{
> + dev->power.status = DPM_ON;
> + pm_runtime_init(dev);
> +}
> +
> +/**
> * device_pm_lock - lock the list of active devices used by the PM core
> */
> void device_pm_lock(void)
> @@ -105,6 +116,8 @@ void device_pm_remove(struct device *dev
> mutex_lock(&dpm_list_mtx);
> list_del_init(&dev->power.entry);
> mutex_unlock(&dpm_list_mtx);
> +
> + pm_runtime_remove(dev);
> }

Calling pm_runtime_init() from device_pm_init() and
pm_runtime_remove() from device_pm_remove() isn't good. If
CONFIG_PM_SLEEP isn't enabled then the calls won't be compiled, even
if CONFIG_PM_RUNTIME is set.


> @@ -757,11 +771,15 @@ static int dpm_prepare(pm_message_t stat
> dev->power.status = DPM_PREPARING;
> mutex_unlock(&dpm_list_mtx);
>
> - error = device_prepare(dev, state);
> + if (pm_runtime_disable(dev) && device_may_wakeup(dev))
> + error = -EBUSY;

What's the reason for the -EBUSY error?


> --- linux-2.6.orig/drivers/base/dd.c
> +++ linux-2.6/drivers/base/dd.c
> @@ -202,7 +203,9 @@ int driver_probe_device(struct device_dr
> pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
> drv->bus->name, __func__, dev_name(dev), drv->name);
>
> + pm_runtime_get_noresume(dev);
> ret = really_probe(dev, drv);
> + pm_runtime_put_noidle(dev);

This is bad because it won't wait if there's a runtime-PM call in
progress. Also, we shouldn't use put_noidle because it might subvert
the driver's attempt to autosuspend. Instead we should do something
like this:

/* Wait for runtime PM calls to finish and prevent new calls
* until the probe is done.
*/
pm_runtime_disable(dev);
pm_runtime_get_noresume(dev);
pm_runtime_enable(dev):
ret = really_probe(dev, drv);
pm_runtime_put_sync(dev);


> --- /dev/null
> +++ linux-2.6/Documentation/power/runtime_pm.txt

> +2. Device Run-time PM Callbacks

> +In particular, it is recommended that ->runtime_suspend() return -EBUSY if
> +device_may_wakeup() returns 'false' for the device.

What's the point of this? I don't understand -- we don't want to
discourage people from suspending devices with wakeup enabled.


> +Additionally, the helper functions provided by the PM core obey the following
> +rules:
> +
> + * If ->runtime_suspend() is about to be executed or the execution of it is
> + scheduled or there's a pending request to execute it, ->runtime_idle() will
> + not be executed for the same device.

Shouldn't we allow runtime_idle when a suspend is scheduled? The idle
handler might decide to suspend right away instead of waiting for the
timer to expire.


> +4. Run-time PM Device Helper Functions
> +
> +The following run-time PM helper functions are defined in
> +drivers/base/power/runtime.c and include/linux/pm_runtime.h:

> + int pm_schedule_suspend(struct device *dev, unsigned int delay);
> + - schedule the execution of ->runtime_suspend() for the device's bus type
> + in future, where 'delay' is the time to wait before queuing up a suspend
> + work item in pm_wq, in miliseconds (if 'delay' is zero, the work item is

Fix spelling of "milli". Explain that the new delay will override the
old one if a suspend was already scheduled and not yet expired.

> + int pm_runtime_get_sync(struct device *dev);
> + - increment the device's usage counter, run pm_rutime_resume(dev) and return

Fix spelling of "runtime". Same under pm_runtime_put_sync.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/