Nishanth Menon <nm@xxxxxx> writes:
OMAP device hooks around suspend|resume_noirq ensures that hwmod
devices are forced to idle using omap_device_idle/enable as part of
the last stage of suspend activity.
For a device such as i2c who uses autosuspend, it is possible to enter
the suspend path with dev->power.runtime_status = RPM_ACTIVE.
As part of the suspend flow, the generic runtime logic would increment
it's dev->power.disable_depth to 1. This should prevent further
pm_runtime_get_sync from succeeding once the runtime_status has been
set to RPM_SUSPENDED.
Now, as part of the suspend_noirq handler in omap_device, we force the
following: if the device status is !suspended, we force the device
to idle using omap_device_idle (clocks are cut etc..). This ensures
that from a hardware perspective, the device is "suspended". However,
runtime_status is left to be active.
*if* an operation is attempted after this point to
pm_runtime_get_sync, runtime framework depends on runtime_status to
indicate accurately the device status, and since it sees it to be
ACTIVE, it assumes the module is functional and returns a non-error
value. As a result the user will see pm_runtime_get succeed, however a
register access will crash due to the lack of clocks.
Ouch.
Dumb Q: who is requesting an i2c transaction after ->suspend_noirq().
The i2c driver itself should be able to detect that it's being accessed
after this point and return an error.
That being said, I agree that omap_device should still be catching this
case in order to find/fix driver races like this.
To prevent this from happening, we should ensure that runtime_status
exactly indicates the device status. As a result of this change
any further calls to pm_runtime_get* would return -EACCES (since
disable_depth is 1). On resume, we restore the clocks and runtime
status exactly as we suspended with.
Reported-by: J Keerthy <j-keerthy@xxxxxx>
Signed-off-by: Nishanth Menon <nm@xxxxxx>
Acked-by: Rajendra Nayak <rnayak@xxxxxx>
---
patch baseline: V3.12 tag (also applies on linux-next next-20131107 tag)
Logs from 3.12 based vendor kernel:
Before: http://pastebin.com/m5KxnB7a
After: http://pastebin.com/8AfX4e7r
The discussion on cpufreq side of the story which triggered this scenario:
http://marc.info/?l=linux-pm&m=138263811321921&w=2
Tested on TI vendor kernel (with dt boot):
AM335x: evm, BBB, sk, BBW
OMAP5uEVM, DRA7-evm
arch/arm/mach-omap2/omap_device.c | 16 ++++++++++++++--
arch/arm/mach-omap2/omap_device.h | 1 +
2 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/arch/arm/mach-omap2/omap_device.c b/arch/arm/mach-omap2/omap_device.c
index b69dd9a..87ecbb0 100644
--- a/arch/arm/mach-omap2/omap_device.c
+++ b/arch/arm/mach-omap2/omap_device.c
@@ -621,6 +621,13 @@ static int _od_suspend_noirq(struct device *dev)
if (!ret && !pm_runtime_status_suspended(dev)) {
if (pm_generic_runtime_suspend(dev) == 0) {
+ if (!pm_runtime_suspended(dev)) {
Why the addition check for supended here?
This version (as opposed to the _status_suspended() version above will
fail if runtime PM has been disabled from userspace (via
/sys/devices/.../power/control), and will thus prevent low power states
from being hit in suspend if runtime suspend has been disabled from
userspace. That's a bug.
+ /* NOTE: *might* indicate driver race */
Yes, a driver race which should then be fixed in the driver.
+ dev_dbg(dev, "%s: Force suspending\n",
+ __func__);
+ pm_runtime_set_suspended(dev);
+ od->flags |= OMAP_DEVICE_SUSPEND_FORCED;
Not sure why you need an additonal flag. Why not just always do this
and use the existin OMAP_DEVICE_SUSPENDED flag.