Re: [PATCH v8 3/4] driver core: shut down devices asynchronously

From: stuart hayes
Date: Wed Sep 11 2024 - 18:06:55 EST




On 9/11/2024 12:51 AM, Jan Kiszka wrote:
On 11.09.24 02:14, stuart hayes wrote:


On 9/8/2024 8:36 AM, Jan Kiszka wrote:
On 22.08.24 22:28, Stuart Hayes wrote:
Add code to allow asynchronous shutdown of devices, ensuring that each
device is shut down before its parents & suppliers.

Only devices with drivers that have async_shutdown_enable enabled
will be
shut down asynchronously.

This can dramatically reduce system shutdown/reboot time on systems that
have multiple devices that take many seconds to shut down (like certain
NVMe drives). On one system tested, the shutdown time went from 11
minutes
without this patch to 55 seconds with the patch.

Signed-off-by: Stuart Hayes <stuart.w.hayes@xxxxxxxxx>
Signed-off-by: David Jeffery <djeffery@xxxxxxxxxx>
---
  drivers/base/base.h           |  4 +++
  drivers/base/core.c           | 54 ++++++++++++++++++++++++++++++++++-
  include/linux/device/driver.h |  2 ++
  3 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 0b53593372d7..aa5a2bd3f2b8 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -10,6 +10,7 @@
   * shared outside of the drivers/base/ directory.
   *
   */
+#include <linux/async.h>
  #include <linux/notifier.h>
    /**
@@ -97,6 +98,8 @@ struct driver_private {
   *    the device; typically because it depends on another driver
getting
   *    probed first.
   * @async_driver - pointer to device driver awaiting probe via
async_probe
+ * @shutdown_after - used during device shutdown to ensure correct
shutdown
+ *    ordering.
   * @device - pointer back to the struct device that this structure is
   * associated with.
   * @dead - This device is currently either in the process of or has
been
@@ -114,6 +117,7 @@ struct device_private {
      struct list_head deferred_probe;
      const struct device_driver *async_driver;
      char *deferred_probe_reason;
+    async_cookie_t shutdown_after;
      struct device *device;
      u8 dead:1;
  };
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 7e50daa65ca0..dd3652ea56fe 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -9,6 +9,7 @@
   */
    #include <linux/acpi.h>
+#include <linux/async.h>
  #include <linux/cpufreq.h>
  #include <linux/device.h>
  #include <linux/err.h>
@@ -3531,6 +3532,7 @@ static int device_private_init(struct device *dev)
      klist_init(&dev->p->klist_children, klist_children_get,
             klist_children_put);
      INIT_LIST_HEAD(&dev->p->deferred_probe);
+    dev->p->shutdown_after = 0;
      return 0;
  }
  @@ -4781,6 +4783,8 @@ int device_change_owner(struct device *dev,
kuid_t kuid, kgid_t kgid)
  }
  EXPORT_SYMBOL_GPL(device_change_owner);
  +static ASYNC_DOMAIN(sd_domain);
+
  static void shutdown_one_device(struct device *dev)
  {
      /* hold lock to avoid race with probe/release */
@@ -4816,12 +4820,34 @@ static void shutdown_one_device(struct device
*dev)
          put_device(dev->parent);
  }
  +/**
+ * shutdown_one_device_async
+ * @data: the pointer to the struct device to be shutdown
+ * @cookie: not used
+ *
+ * Shuts down one device, after waiting for shutdown_after to complete.
+ * shutdown_after should be set to the cookie of the last child or
consumer
+ * of this device to be shutdown (if any), or to the cookie of the
previous
+ * device to be shut down for devices that don't enable asynchronous
shutdown.
+ */
+static void shutdown_one_device_async(void *data, async_cookie_t
cookie)
+{
+    struct device *dev = data;
+
+    async_synchronize_cookie_domain(dev->p->shutdown_after + 1,
&sd_domain);
+
+    shutdown_one_device(dev);
+}
+
  /**
   * device_shutdown - call ->shutdown() on each device to shutdown.
   */
  void device_shutdown(void)
  {
      struct device *dev, *parent;
+    async_cookie_t cookie = 0;
+    struct device_link *link;
+    int idx;
        wait_for_device_probe();
      device_block_probing();
@@ -4852,11 +4878,37 @@ void device_shutdown(void)
          list_del_init(&dev->kobj.entry);
          spin_unlock(&devices_kset->list_lock);
  -        shutdown_one_device(dev);
+
+        /*
+         * Set cookie for devices that will be shut down synchronously
+         */
+        if (!dev->driver || !dev->driver->async_shutdown_enable)
+            dev->p->shutdown_after = cookie;
+
+        get_device(dev);
+        get_device(parent);
+
+        cookie = async_schedule_domain(shutdown_one_device_async,
+                           dev, &sd_domain);
+        /*
+         * Ensure parent & suppliers wait for this device to shut down
+         */
+        if (parent) {
+            parent->p->shutdown_after = cookie;
+            put_device(parent);
+        }
+
+        idx = device_links_read_lock();
+        list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
+                device_links_read_lock_held())
+            link->supplier->p->shutdown_after = cookie;

This will not fly if a supplier registered after its consumer. As we are
walking the list backward, the supplier will now wait for something that
is coming later during shutdown if the affected devices are still doing
this synchronously (as almost all at this stage). This creates a
circular dependency.

Seems to explain the reboot hang that I'm seeing on an embedded target
with a mailbox dev waiting for a remoteproc dev while the mailbox being
after the remoteproc in the list (thus first on shutting down).

This resolves the issue for me so far, but I don't think we are done yet:

list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
        device_links_read_lock_held()) {
    if (link->supplier->driver &&
        link->supplier->driver->async_shutdown_enable)
        link->supplier->p->shutdown_after = cookie;
}

I wonder if overwriting the supplier's shutdown_after unconditionally is
a good idea. A supplier may have multiple consumers - will we overwrite
in the right order then? And why do we now need this ordering when we
were so far shutting down suppliers while consumers were still up?


The devices_kset list should already be in the right order for shutting
stuff down--i.e., parents and suppliers should be shutdown later as the
device_shutdown loop goes through the devices.

With async shutdown this loop still goes the devices_kset list in the same
order it did before the patch, so my expectation was that any
parents/suppliers
would come later in the loop than any children/siblings, and it would
update
shutdown_after as it went to ensure that each device ended up with the
cookie
of the last child/consumer that it needed to wait for.

However, I missed that the devices_kset list isn't always reordered when a
devlink is added and a consumer isn't dependent on the supplier (see
device_is_dependent()).  I have a patch would address that, and add a
sanity
check in case any devices get in the list in the wrong order somehow:


diff --git a/drivers/base/core.c b/drivers/base/core.c
index b69b82da8837..52d64b419c01 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -4832,6 +4832,13 @@ static void shutdown_one_device_async(void *data,
async_cookie_t cookie)
 {
     struct device *dev = data;
+    /*
+     * Sanity check to prevent shutdown hang in case a parent or supplier
+     * is in devices_kset list in the wrong order
+     */
+    if (dev->p->shutdown_after > cookie)
+        dev->p->shutdown_after = cookie - 1;
+
     async_synchronize_cookie_domain(dev->p->shutdown_after + 1,
&sd_domain);
     shutdown_one_device(dev);
@@ -4898,8 +4905,11 @@ void device_shutdown(void)
         idx = device_links_read_lock();
         list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
-                device_links_read_lock_held())
+                device_links_read_lock_held()) {
+            if (device_link_flag_is_sync_state_only(link->flags))
+                continue;
             link->supplier->p->shutdown_after = cookie;
+        }
         device_links_read_unlock(idx);
         put_device(dev);

I'll submit this shortly if nobody responds with any issues with this.

Thank you!


This sounds widely reasonable to me, and a quick check confirmed that it
apparently resolves the issue I was seeing.

I'm still wondering, though, if overwriting the parent's shutdown_after
and only checking later on in shutdown_one_device_async is sufficient or
if it wouldn't be safer to have a check when we write. The fact that
there could be multiple children for a parent is worrying me.

Jan


Having multiple children isn't a problem. All of the children will be in
the shutdown loop before the parent, and as each of them is seen, the
parent's shutdown_after will be updated with the cookie of the latest
child to be shut down. When the parent then does a synchronize_cookie to
wait for that last child, it won't continue until the earlier children have
also shutdown, because synchronize_cookie doesn't just wait for that one
cookie--it waits until the specified cookie is the lowest cookie in
progress, which means all the earlier children are also done shutting down.


Same overwrite question applies to setting shutdown_after in parents.
Don't we rather need a list for shutdown_after, at least once everything
is async?

This needs to be thought through once more, I guess.

Jan

+        device_links_read_unlock(idx);
+        put_device(dev);
            spin_lock(&devices_kset->list_lock);
      }
      spin_unlock(&devices_kset->list_lock);
+    async_synchronize_full_domain(&sd_domain);
  }
    /*
diff --git a/include/linux/device/driver.h
b/include/linux/device/driver.h
index 1fc8b68786de..2b6127faaa25 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -56,6 +56,7 @@ enum probe_type {
   * @mod_name:    Used for built-in modules.
   * @suppress_bind_attrs: Disables bind/unbind via sysfs.
   * @probe_type:    Type of the probe (synchronous or asynchronous)
to use.
+ * @async_shutdown_enable: Enables devices to be shutdown
asynchronously.
   * @of_match_table: The open firmware table.
   * @acpi_match_table: The ACPI match table.
   * @probe:    Called to query the existence of a specific device,
@@ -102,6 +103,7 @@ struct device_driver {
        bool suppress_bind_attrs;    /* disables bind/unbind via
sysfs */
      enum probe_type probe_type;
+    bool async_shutdown_enable;
        const struct of_device_id    *of_match_table;
      const struct acpi_device_id    *acpi_match_table;