Re: [PATCH v1] PM: sleep: core: Synchronize runtime PM status of parents and children
From: Jon Hunter
Date: Fri Feb 07 2025 - 08:39:17 EST
Hi Rafael,
On 28/01/2025 19:24, Rafael J. Wysocki wrote:
From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
Commit 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the
resume phase") overlooked the case in which the parent of a device with
DPM_FLAG_SMART_SUSPEND set did not use that flag and could be runtime-
suspended before a transition into a system-wide sleep state. In that
case, if the child is resumed during the subsequent transition from
that state into the working state, its runtime PM status will be set to
RPM_ACTIVE, but the runtime PM status of the parent will not be updated
accordingly, even though the parent will be resumed too, because of the
dev_pm_skip_suspend() check in device_resume_noirq().
Address this problem by tracking the need to set the runtime PM status
to RPM_ACTIVE during system-wide resume transitions for devices with
DPM_FLAG_SMART_SUSPEND set and all of the devices depended on by them.
Fixes: 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the resume phase")
Closes: https://lore.kernel.org/linux-pm/Z30p2Etwf3F2AUvD@xxxxxxxxxxxxxxxxxxxx/
Reported-by: Johan Hovold <johan@xxxxxxxxxx>
Tested-by: Manivannan Sadhasivam <manivannan.sadhasivam@xxxxxxxxxx>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
---
drivers/base/power/main.c | 29 ++++++++++++++++++++---------
include/linux/pm.h | 1 +
2 files changed, 21 insertions(+), 9 deletions(-)
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -656,13 +656,15 @@
* so change its status accordingly.
*
* Otherwise, the device is going to be resumed, so set its PM-runtime
- * status to "active", but do that only if DPM_FLAG_SMART_SUSPEND is set
- * to avoid confusing drivers that don't use it.
+ * status to "active" unless its power.set_active flag is clear, in
+ * which case it is not necessary to update its PM-runtime status.
*/
- if (skip_resume)
+ if (skip_resume) {
pm_runtime_set_suspended(dev);
- else if (dev_pm_skip_suspend(dev))
+ } else if (dev->power.set_active) {
pm_runtime_set_active(dev);
+ dev->power.set_active = false;
+ }
if (dev->pm_domain) {
info = "noirq power domain ";
@@ -1189,18 +1191,24 @@
return PMSG_ON;
}
-static void dpm_superior_set_must_resume(struct device *dev)
+static void dpm_superior_set_must_resume(struct device *dev, bool set_active)
{
struct device_link *link;
int idx;
- if (dev->parent)
+ if (dev->parent) {
dev->parent->power.must_resume = true;
+ if (set_active)
+ dev->parent->power.set_active = true;
+ }
idx = device_links_read_lock();
- list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node)
+ list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node) {
link->supplier->power.must_resume = true;
+ if (set_active)
+ link->supplier->power.set_active = true;
+ }
device_links_read_unlock(idx);
}
@@ -1278,8 +1286,11 @@
dev->power.may_skip_resume))
dev->power.must_resume = true;
- if (dev->power.must_resume)
- dpm_superior_set_must_resume(dev);
+ if (dev->power.must_resume) {
+ dev->power.set_active = dev->power.set_active ||
+ dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND);
+ dpm_superior_set_must_resume(dev, dev->power.set_active);
+ }
Complete:
complete_all(&dev->power.completion);
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -683,6 +683,7 @@
bool no_pm_callbacks:1; /* Owned by the PM core */
bool async_in_progress:1; /* Owned by the PM core */
bool must_resume:1; /* Owned by the PM core */
+ bool set_active:1; /* Owned by the PM core */
bool may_skip_resume:1; /* Set by subsystems */
#else
bool should_wakeup:1;
I am seeing the following crash during suspend on a couple of our boards (with mainline/next) and bisect is pointing to this commit ...
[ 216.311009] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 216.311229] Mem abort info:
[ 216.311298] ESR = 0x0000000096000004
[ 216.311393] EC = 0x25: DABT (current EL), IL = 32 bits
[ 216.311515] SET = 0, FnV = 0
[ 216.311593] EA = 0, S1PTW = 0
[ 216.311668] FSC = 0x04: level 0 translation fault
[ 216.311770] Data abort info:
[ 216.311855] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 216.311972] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 216.312081] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 216.312267] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010b905000
[ 216.312411] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[ 216.312620] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 216.312747] Modules linked in: snd_soc_tegra210_admaif snd_soc_tegra186_asrc snd_soc_tegra_pcm snd_soc_tegra210_mixer snd_soc_tegra210_adx snd_soc_tegra210_mvc snd_soc_tegra210_ope snd_soc_tegra210_dmic snd_soc_tegra210_amx snd_soc_tegra210_i2s snd_soc_tegra210_sfc tegra_drm drm_dp_aux_bus cec drm_display_helper drm_client_lib drm_kms_helper snd_soc_tegra210_ahub tegra210_adma drm snd_soc_tegra_audio_graph_card snd_soc_audio_graph_card snd_soc_rt5659 backlight snd_soc_simple_card_utils snd_soc_rl6231 ucsi_ccg typec_ucsi typec pwm_tegra ina3221 tegra_aconnect pwm_fan snd_hda_codec_hdmi snd_hda_tegra snd_hda_codec snd_hda_core phy_tegra194_p2u tegra_xudc at24 lm90 tegra_bpmp_thermal host1x pcie_tegra194 ip_tables x_tables ipv6
[ 216.354364] CPU: 1 UID: 0 PID: 14450 Comm: rtcwake Tainted: G W 6.14.0-rc1-next-20250206-g808eb958781e #1
[ 216.365388] Tainted: [W]=WARN
[ 216.368279] Hardware name: NVIDIA Jetson AGX Xavier Developer Kit (DT)
[ 216.375099] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 216.381928] pc : simple_pm_bus_runtime_suspend+0x14/0x48
[ 216.387698] lr : pm_generic_runtime_suspend+0x2c/0x44
[ 216.392962] sp : ffff80009003baf0
[ 216.396103] x29: ffff80009003baf0 x28: ffff000088043480 x27: 0000000000000000
[ 216.403453] x26: ffff800082c11350 x25: ffff800082e0e000 x24: ffff80008093d634
[ 216.410803] x23: 0000000000000000 x22: 0000000000000002 x21: ffff800082e0e000
[ 216.418411] x20: ffff800080938af8 x19: ffff0000808ec410 x18: ffffffffffff4ac8
[ 216.425328] x17: 2033303a35353a36 x16: 312031312d39302d x15: 000047edca5d49da
[ 216.432937] x14: ffff000088043500 x13: 0000000000000001 x12: 0000000000000000
[ 216.440110] x11: 000000321a7856a0 x10: 0000000000000aa0 x9 : ffff80009003b8e0
[ 216.447219] x8 : ffff000088043f80 x7 : ffff0003fdf08a40 x6 : 0000000000000000
[ 216.454291] x5 : 0000000000000000 x4 : ffff800081ffe8c0 x3 : ffff0000808ec410
[ 216.461723] x2 : ffff8000805d133c x1 : 0000000000000000 x0 : 0000000000000000
[ 216.468986] Call trace:
[ 216.471179] simple_pm_bus_runtime_suspend+0x14/0x48 (P)
[ 216.476775] pm_generic_runtime_suspend+0x2c/0x44
[ 216.481499] pm_runtime_force_suspend+0x54/0x14c
[ 216.486049] device_suspend_noirq+0x6c/0x278
[ 216.490253] dpm_suspend_noirq+0xc0/0x198
[ 216.494278] suspend_devices_and_enter+0x210/0x4c0
[ 216.499348] pm_suspend+0x164/0x1c8
[ 216.503023] state_store+0x8c/0xfc
[ 216.506260] kobj_attr_store+0x18/0x2c
[ 216.509940] sysfs_kf_write+0x44/0x54
[ 216.513699] kernfs_fop_write_iter+0x118/0x1a8
[ 216.518163] vfs_write+0x2b0/0x35c
[ 216.521399] ksys_write+0x68/0xfc
[ 216.524810] __arm64_sys_write+0x1c/0x28
[ 216.528574] invoke_syscall+0x48/0x110
[ 216.532253] el0_svc_common.constprop.0+0x40/0xe8
[ 216.536628] do_el0_svc+0x20/0x2c
[ 216.540299] el0_svc+0x30/0xd0
[ 216.543016] el0t_64_sync_handler+0x144/0x168
[ 216.547736] el0t_64_sync+0x198/0x19c
[ 216.551327] Code: a9be7bfd 910003fd a90153f3 f9403c00 (f9400014)
[ 216.557197] ---[ end trace 0000000000000000 ]---
I have not looked any further, but if you have any thoughts, let me know.
Thanks
Jon
--
nvpublic