Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent

From: Jon Hunter
Date: Thu May 01 2025 - 05:51:39 EST


Hi Rafael,

On 14/03/2025 12:50, Rafael J. Wysocki wrote:
From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>

According to [1], the handling of device suspend and resume, and
particularly the latter, involves unnecessary overhead related to
starting new async work items for devices that cannot make progress
right away because they have to wait for other devices.

To reduce this problem in the resume path, use the observation that
starting the async resume of the children of a device after resuming
the parent is likely to produce less scheduling and memory management
noise than starting it upfront while at the same time it should not
increase the resume duration substantially.

Accordingly, modify the code to start the async resume of the device's
children when the processing of the parent has been completed in each
stage of device resume and only start async resume upfront for devices
without parents.

Also make it check if a given device can be resumed asynchronously
before starting the synchronous resume of it in case it will have to
wait for another that is already resuming asynchronously.

In addition to making the async resume of devices more friendly to
systems with relatively less computing resources, this change is also
preliminary for analogous changes in the suspend path.

On the systems where it has been tested, this change by itself does
not affect the overall system resume duration in a measurable way.

Link: https://lore.kernel.org/linux-pm/20241114220921.2529905-1-saravanak@xxxxxxxxxx/ [1]
Suggested-by: Saravana Kannan <saravanak@xxxxxxxxxx>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>


I have noticed a suspend regression with -next on a couple of our Tegra boards. Bisect was pointing to the following merge commit ...

# first bad commit: [218a7bbf861f83398ac9767620e91983e36eac05] Merge branch 'pm-sleep' into linux-next

On top of next-20250429 I found that by reverting the following changes that suspend is working again ...

Revert "PM: sleep: Resume children after resuming the parent"
Revert "PM: sleep: Suspend async parents after suspending children"
Revert "PM: sleep: Make suspend of devices more asynchronous"

I have been looking into this a bit more to see what device is failing and by adding a bit of debug I found that entry to suspend was failing on the Tegra194 Jetson AGX Xavier (tegra194-p2972-0000.dts) platform when one of the I2C controllers (i2c@c240000) was being suspended.

I found that if I disable only this I2C controller in device-tree suspend worked again on top of -next. This I2C controller has 3 devices on the platform; two ina3221 devices and one Cypress Type-C controller. I then found that removing only the two ina3221 devices (in tegra194-p2888.dtsi) also allows suspend to work.

At this point, I am still unclear why this is now failing. If you have any thoughts or things I can try please let me know.

Thanks!
Jon

--
nvpublic