Re: [RFC][PATCH 1/3] PM / sleep: Flags to speed up suspend-resume of runtime-suspended devices

From: Alan Stern
Date: Fri May 02 2014 - 14:44:51 EST


On Fri, 2 May 2014, Rafael J. Wysocki wrote:

> Well, I have a second update.
>
> It has different flag names and changelog (that should explain things better
> hopefully) and the purpose of both flags should be more clear now (patch [3/3]
> would need to be reworked on top of this, but for now let's just discuss the
> core changes).

We've got patch descriptions passing in the night! :-)

This doesn't contain any changes to the patch itself, apart from the
flag names, right? The description below is much better than the
earlier one, but I still feel this deserves to be split in two: one
patch for each new flag.

> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> Subject: PM / sleep: Flags to speed up suspend-resume of runtime-suspended devices
>
> Currently, some subsystems (e.g. PCI and the ACPI PM domain) have to
> resume all runtime-suspended devices during system suspend, mostly
> because those devices may need to be reprogrammed due to different
> wakeup settings for system sleep and for runtime PM. However, at
> least in some cases, that isn't really necessary, because the wakeup
> settings may not be really different.
>
> The idea here is that subsystems should know whether or not it is
> necessary to resume a given device during system suspend as long as
> they know that the device's children will not need it to be functional
> during the late/early and noirq phases of their suspend and resume.

Perhaps the matter of the children's requirements should be discussed
more fully. I skimmed over it in my suggested description too.

Under what conditions will a child need the parent device to be
functional? Let's start by assuming the parent's ignore_children
bit isn't set.

By this assumption, if the child was at full power during the suspend
stages then the parent would have to be at full power too. So let's
assume that the child is in runtime suspend when its ->suspend()
routine runs. I can't think of any scenario where the child's driver
would require the parent to be at full power without also needing the
child to be at full power. If the child really does need to be at full
power then the driver will have to do a runtime resume, which would
also bring the parent to full power. Either way, we don't have to do
anything special -- during the suspend stages, if the child needs the
parent to be at full power then it will be.

(As a variant of this case, maybe the child belongs to one of the
subsystems like PCI, and its driver expects the subsystem to
runtime-resume the child before invoking its ->suspend() callback.
When the subsystem does this, the parent will automatically be resumed
as well. Again there are no special requirements; the point is moot
because the parent will never be runtime-suspended when its ->suspend()
routine is ready to run.)

During the resume stages, if the child is going to be restored to full
power then certainly the parent has to be at full power first.
Drivers expect this, so if we're going to leave the parent in runtime
suspend during system resume, we have to get the child driver's
permission first. _That's_ what the parent_needed flag should mean.

What about the case where ignore_children _is_ set? Then the child's
driver might indeed need the parent to be at full power during system
suspend, since we could start off with the parent suspended and the
child active.

Putting these arguments together, the result is that during system
suspend we don't care about the children's needs unless the parent's
ignore_children bit is set. But during system resume, we must resume
the parent unless the child's driver says we don't have to.

As a corollary, if we don't have the child's permission to leave the
parent suspended during system resume then we have to invoke all of the
parent's resume callbacks, which means we also have to invoke all the
suspend callbacks. However, we still might be able to leave the parent
in runtime suspend during the suspend stages. The decision whether or
not to do so should be up to the subsystem or driver, not the PM core;
the subsystem's callback routines can check the device's runtime status
and then do what they want.

> To help them with that, introduce two new device PM flags:
> power.parent_needed and power.leave_runtime_suspended supposed to work
> as follows.
>
> The PM core will clear power.leave_runtime_suspended and will set
> power.parent_needed for all devices in dpm_prepare(). Next, the
> subsystem (or driver) of a device that in principle may not need
> to be resumed during system suspend, if runtume-suspended already,
> will set power.leave_runtime_suspended in its ->prepare() callback.
> Also the subsystems (or drivers) of devices whose parents need not
> be resumed during system suspend, if runtime-suspended already,
> are supposed to clear power.parent_needed for them. The PM core
> will then clear power.leave_runtime_suspended for the parents of
> all devices having power.parent_needed set in __device_suspend().

You are using leave_runtime_suspended to mean two different things:
remain runtime-suspended during the system suspend stages (i.e., no
reprogramming is needed so don't go to full power), and remain
runtime-suspended during both the system suspend and system resume
stages. Only the first meaning matters if all you want to accomplish
is to avoid unnecessary runtime resumes during system suspend.

For the first meaning -- and I claim that this is the appropriate
meaning for this patch -- the leave_runtime_suspend flag doesn't depend
on the children's needs, except in the case where the parent's
ignore_children bit is set. In that case, we could simply force the
parent's leave_runtime_suspended flag to be always off. Or we could
leave it set if it is set in all of the parent's children.

The parent_needed flag is the one that really has to propagate up the
device tree. If this flag is set in a child then the PM core has to
invoke all the suspend and resume callbacks, not just in the child's
parent but in all its ancestors. (Perhaps you could stop if you reach
an ancestor with ignore_children set, but it's safer not to.)

> Now, if the ->suspend() callback is executed for a device whose
> power.leave_runtime_suspended is set, it can simply return 0 after
> checking the device's state if that state is appropriate for
> system suspend. The PM core will then skip the late/early and
> noirq system suspend/resume callbacks for that device and will
> use pm_runtime_resume() to resume it in device_resume().

By the discussion above, the PM core shouldn't skip anything
unless the parent_needed flag is clear.

> If the state of a device with power.leave_runtime_suspended is not
> appropriate for system suspend, the ->suspend() callback should
> resume it using pm_runtime_resume() and clear
> power.leave_runtime_suspended for it.

Oh yes, I forgot to discuss this earlier. We have two choices for
handling this:

As you wrote above, require drivers not to set
leave_runtime_suspended if the device isn't in an appropriate
state, and propagate the flag up the device tree. (But as
I mentioned, in most cases the flag shouldn't need to be
propagated.)

Make the PM core automatically clear leave_runtime_suspended
whenever the device is (or becomes) runtime-active. Then
callbacks don't have to check whether the device actually is in
runtime suspend; they just have to check the flag.

I prefer the second choice, because it is easier for drivers.

> Note: Drivers (or bus types etc.) can reasonably expect that the
> next PM callback executed after ->runtime_suspend() will be
> ->runtime_resume() rather than ->resume_noirq() or ->resume_early().
> This change is designed with that expectation in mind.

Except, of course, that in the current kernel this isn't true. And
there probably are a few cases where it can't ever be true.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/