Re: [PATCH v3] driver core: Add timeout for device shutdown

From: Daniel Lezcano
Date: Thu Jun 06 2024 - 11:25:51 EST


On 06/06/2024 10:50, Soumya Khasnis wrote:
The device shutdown callbacks invoked during shutdown/reboot
are prone to errors depending on the device state or mishandling
by one or more driver. In order to prevent a device hang in such
scenarios, we bail out after a timeout while dumping a meaningful
call trace of the shutdown callback to kernel logs, which blocks
the shutdown or reboot process.

Is that not somehow already achieved by the watchdog mechanism ?

Signed-off-by: Soumya Khasnis <soumya.khasnis@xxxxxxxx>
Signed-off-by: Srinavasa Nagaraju <Srinavasa.Nagaraju@xxxxxxxx>
---
Changes in v3:
-fix review comments
-updated commit message

drivers/base/Kconfig | 18 ++++++++++++++++++
drivers/base/base.h | 8 ++++++++
drivers/base/core.c | 40 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 66 insertions(+)

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 2b8fd6bb7da0..342d3f87a404 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -243,3 +243,21 @@ config FW_DEVLINK_SYNC_STATE_TIMEOUT
work on.
endmenu
+
+config DEVICE_SHUTDOWN_TIMEOUT
+ bool "device shutdown timeout"
+ default y
+ help
+ Enable timeout for device shutdown. In case of device shutdown is
+ broken or device is not responding, system shutdown or restart may hang.
+ This timeout handles such situation and triggers emergency_restart or
+ machine_power_off. Also dumps call trace of shutdown process.
+
+
+config DEVICE_SHUTDOWN_TIMEOUT_SEC
+ int "device shutdown timeout in seconds"
+ range 10 60
+ default 10

How do you know the shutdown time is between this range?

What about large systems ?

+ depends on DEVICE_SHUTDOWN_TIMEOUT
+ help
+ sets time for device shutdown timeout in seconds
diff --git a/drivers/base/base.h b/drivers/base/base.h
index 0738ccad08b2..97eea57a8868 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -243,3 +243,11 @@ static inline int devtmpfs_delete_node(struct device *dev) { return 0; }
void software_node_notify(struct device *dev);
void software_node_notify_remove(struct device *dev);
+
+#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
+struct device_shutdown_timeout {
+ struct timer_list timer;
+ struct task_struct *task;
+};
+#define SHUTDOWN_TIMEOUT CONFIG_DEVICE_SHUTDOWN_TIMEOUT_SEC
+#endif
diff --git a/drivers/base/core.c b/drivers/base/core.c
index b93f3c5716ae..dab455054a80 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -35,6 +35,12 @@
#include "base.h"
#include "physical_location.h"
#include "power/power.h"
+#include <linux/sched/debug.h>
+#include <linux/reboot.h>
+
+#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
+struct device_shutdown_timeout devs_shutdown;
+#endif
/* Device links support. */
static LIST_HEAD(deferred_sync);
@@ -4799,6 +4805,38 @@ int device_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid)
}
EXPORT_SYMBOL_GPL(device_change_owner);
+#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
+static void device_shutdown_timeout_handler(struct timer_list *t)
+{
+ pr_emerg("**** device shutdown timeout ****\n");
+ show_stack(devs_shutdown.task, NULL, KERN_EMERG);
+ if (system_state == SYSTEM_RESTART)
+ emergency_restart();
+ else
+ machine_power_off();
+}

So if one device is misbehaving, all the others shutdown callbacks are skipped with emergency halt/reboot ? That is prone to break the system, no?

+static void device_shutdown_timer_set(void)
+{
+ devs_shutdown.task = current;
+ timer_setup(&devs_shutdown.timer, device_shutdown_timeout_handler, 0);
+ devs_shutdown.timer.expires = jiffies + SHUTDOWN_TIMEOUT * HZ;
+ add_timer(&devs_shutdown.timer);
+}
+
+static void device_shutdown_timer_clr(void)
+{
+ del_timer(&devs_shutdown.timer);
+}
+#else
+static inline void device_shutdown_timer_set(void)
+{
+}
+static inline void device_shutdown_timer_clr(void)
+{
+}
+#endif
+
/**
* device_shutdown - call ->shutdown() on each device to shutdown.
*/
@@ -4810,6 +4848,7 @@ void device_shutdown(void)
device_block_probing();
cpufreq_suspend();
+ device_shutdown_timer_set();
spin_lock(&devices_kset->list_lock);
/*
@@ -4869,6 +4908,7 @@ void device_shutdown(void)
spin_lock(&devices_kset->list_lock);
}
spin_unlock(&devices_kset->list_lock);
+ device_shutdown_timer_clr();
}
/*

--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog