Re: [PATCH v1 5/5] PM: sleep: Make late and noirq suspend of devices more asynchronous
From: Saravana Kannan
Date: Wed Mar 12 2025 - 21:48:01 EST
On Tue, Feb 25, 2025 at 8:46 AM Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote:
>
> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>
> In analogy with previous changes, make device_suspend_late() and
> device_suspend_noirq() start the async suspend of the device's parent
> and suppliers after the device itself has been processed and make
> dpm_suspend_late() and dpm_noirq_suspend_devices() start processing
> "async" leaf devices (that is, devices without children or consumers)
> upfront because they don't need to wait for any other devices.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> ---
> drivers/base/power/main.c | 60 +++++++++++++++++++++++++++++++++++++++-------
> 1 file changed, 52 insertions(+), 8 deletions(-)
>
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -1317,6 +1317,8 @@
> device_links_read_unlock(idx);
> }
>
> +static void async_suspend_noirq(void *data, async_cookie_t cookie);
> +
> /**
> * device_suspend_noirq - Execute a "noirq suspend" callback for given device.
> * @dev: Device to handle.
> @@ -1396,7 +1398,13 @@
> Complete:
> complete_all(&dev->power.completion);
> TRACE_SUSPEND(error);
> - return error;
> +
> + if (error || async_error)
> + return error;
> +
> + dpm_async_suspend_superior(dev, async_suspend_noirq);
> +
> + return 0;
> }
>
> static void async_suspend_noirq(void *data, async_cookie_t cookie)
> @@ -1410,6 +1418,7 @@
> static int dpm_noirq_suspend_devices(pm_message_t state)
> {
> ktime_t starttime = ktime_get();
> + struct device *dev;
> int error = 0;
>
> trace_suspend_resume(TPS("dpm_suspend_noirq"), state.event, true);
> @@ -1419,15 +1428,28 @@
>
> mutex_lock(&dpm_list_mtx);
>
> + /*
> + * Start processing "async" leaf devices upfront because they don't need
> + * to wait.
> + */
> + list_for_each_entry_reverse(dev, &dpm_late_early_list, power.entry) {
> + dpm_clear_async_state(dev);
> + if (dpm_leaf_device(dev))
> + dpm_async_fn(dev, async_suspend_noirq);
> + }
> +
> while (!list_empty(&dpm_late_early_list)) {
> - struct device *dev = to_device(dpm_late_early_list.prev);
> + dev = to_device(dpm_late_early_list.prev);
>
> list_move(&dev->power.entry, &dpm_noirq_list);
This issue is present in the previous patch too, but it's easier for
me to point it out here. Suspend abort will break now.
For example, say the devices are suspended in the order A -> B -> C ->
D -> E if everything was sync.
Case 1: Fully sync devices
If C aborts, only A and B will be in the dpm_noirq_list. When we try
to undo the suspend, we just resume devices in dpm_noirq_list and that
just resumes A and B.
Case 2: Only C is sync.
When C aborts, A, B, D and E could have finished suspending. But only
A and B will be in the dpm_noirq_list. When we try to undo the
suspend, we just resume devices in dpm_noirq_list and that just
resumes A and B. D and E never get resumed.
My fix for this is to move all devices to dpm_noirq_list if a suspend
aborts and then using the existing
is_suspended/is_noirq_suspended/is_late_suspended flags to skip over
devices that haven't been suspended. That works nicely except in
is_suspended and I tried to fix it in [2]. But you had an issue with
[2] that I didn't fully understand and I meant to dig deeper and fix.
But I didn't get around to it as I got swamped with other work.
[2] - https://lore.kernel.org/linux-pm/20241114220921.2529905-2-saravanak@xxxxxxxxxx/
Thanks,
Saravana
>
> - dpm_clear_async_state(dev);
> - if (dpm_async_fn(dev, async_suspend_noirq))
> + dpm_async_unless_in_progress(dev, async_suspend_noirq);
> +
> + if (dev->power.work_in_progress)
> continue;
>
> + dev->power.work_in_progress = true;
> +
> get_device(dev);
>
> mutex_unlock(&dpm_list_mtx);
> @@ -1492,6 +1514,8 @@
> spin_unlock_irq(&parent->power.lock);
> }
>
> +static void async_suspend_late(void *data, async_cookie_t cookie);
> +
> /**
> * device_suspend_late - Execute a "late suspend" callback for given device.
> * @dev: Device to handle.
> @@ -1568,7 +1592,13 @@
> Complete:
> TRACE_SUSPEND(error);
> complete_all(&dev->power.completion);
> - return error;
> +
> + if (error || async_error)
> + return error;
> +
> + dpm_async_suspend_superior(dev, async_suspend_late);
> +
> + return 0;
> }
>
> static void async_suspend_late(void *data, async_cookie_t cookie)
> @@ -1586,6 +1616,7 @@
> int dpm_suspend_late(pm_message_t state)
> {
> ktime_t starttime = ktime_get();
> + struct device *dev;
> int error = 0;
>
> trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);
> @@ -1597,15 +1628,28 @@
>
> mutex_lock(&dpm_list_mtx);
>
> + /*
> + * Start processing "async" leaf devices upfront because they don't need
> + * to wait.
> + */
> + list_for_each_entry_reverse(dev, &dpm_suspended_list, power.entry) {
> + dpm_clear_async_state(dev);
> + if (dpm_leaf_device(dev))
> + dpm_async_fn(dev, async_suspend_late);
> + }
> +
> while (!list_empty(&dpm_suspended_list)) {
> - struct device *dev = to_device(dpm_suspended_list.prev);
> + dev = to_device(dpm_suspended_list.prev);
>
> list_move(&dev->power.entry, &dpm_late_early_list);
>
> - dpm_clear_async_state(dev);
> - if (dpm_async_fn(dev, async_suspend_late))
> + dpm_async_unless_in_progress(dev, async_suspend_late);
> +
> + if (dev->power.work_in_progress)
> continue;
>
> + dev->power.work_in_progress = true;
> +
> get_device(dev);
>
> mutex_unlock(&dpm_list_mtx);
>
>
>