Re: [PATCH v2] reboot: Add timeout for device shutdown

From: Greg KH
Date: Wed May 29 2024 - 08:51:49 EST


On Wed, May 29, 2024 at 11:00:49AM +0000, Soumya Khasnis wrote:
> The device shutdown callbacks invoked during shutdown/reboot
> are prone to errors depending on the device state or mishandling
> by one or more driver.

Why not fix those drivers? A release callback should not stall, and if
it does, that's a bug that should be fixed there.

Or use a watchdog and just reboot if that triggers at shutdown time.

> In order to prevent a device hang in such
> scenarios, we bail out after a timeout while dumping a meaningful
> call trace of the shutdown callback which blocks the shutdown or
> reboot process.

Dump it where?


>
> Signed-off-by: Soumya Khasnis <soumya.khasnis@xxxxxxxx>
> Signed-off-by: Srinavasa Nagaraju <Srinavasa.Nagaraju@xxxxxxxx>
> ---
> drivers/base/Kconfig | 15 +++++++++++++++
> kernel/reboot.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 60 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index 2b8fd6bb7da0..d06e379b6281 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -243,3 +243,18 @@ config FW_DEVLINK_SYNC_STATE_TIMEOUT
> work on.
>
> endmenu
> +
> +config DEVICE_SHUTDOWN_TIMEOUT
> + bool "device shutdown timeout"
> + default n

That is the default, so no need for this.


> + help
> + Enable timeout for device shutdown. Helps in case device shutdown
> + is hung during shoutdonw and reboot.
> +
> +
> +config DEVICE_SHUTDOWN_TIMEOUT_SEC
> + int "device shutdown timeout in seconds"
> + default 5
> + depends on DEVICE_SHUTDOWN_TIMEOUT
> + help
> + sets time for device shutdown timeout in seconds

You need much more help text for all of these.

And why are these in the drivers/base/Kconfig file? It has nothing to
do with "devices", or the driver core, it's all core kernel reboot
logic.


> diff --git a/kernel/reboot.c b/kernel/reboot.c
> index 22c16e2564cc..8460bd24563b 100644
> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -18,7 +18,7 @@
> #include <linux/syscalls.h>
> #include <linux/syscore_ops.h>
> #include <linux/uaccess.h>
> -
> +#include <linux/sched/debug.h>

Why remove the blank line?

> /*
> * this indicates whether you can reboot with ctrl-alt-del: the default is yes
> */
> @@ -48,6 +48,14 @@ int reboot_cpu;
> enum reboot_type reboot_type = BOOT_ACPI;
> int reboot_force;
>
> +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> +struct device_shutdown_timeout {
> + struct timer_list timer;
> + struct task_struct *task;
> +} devs_shutdown;
> +#define SHUTDOWN_TIMEOUT CONFIG_DEVICE_SHUTDOWN_TIMEOUT_SEC
> +#endif

#ifdefs should not be in .c files, please put this in a .h file where it
belongs. Same for the other #ifdefs.



> +
> struct sys_off_handler {
> struct notifier_block nb;
> int (*sys_off_cb)(struct sys_off_data *data);
> @@ -88,12 +96,46 @@ void emergency_restart(void)
> }
> EXPORT_SYMBOL_GPL(emergency_restart);
>
> +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> +static void device_shutdown_timeout_handler(struct timer_list *t)
> +{
> + pr_emerg("**** device shutdown timeout ****\n");

What does this have to do with "devices"? This is a whole-system issue,
or really a "broken driver" issue.

> + show_stack(devs_shutdown.task, NULL, KERN_EMERG);

How do you know this is the 'device shutdown' stack? What is a "device
shutdown"?

> + if (system_state == SYSTEM_RESTART)
> + emergency_restart();
> + else
> + machine_power_off();
> +}
> +
> +static void device_shutdown_timer_set(void)
> +{
> + devs_shutdown.task = current;

It's just the normal shutdown stack/process, why call it a device?

> + timer_setup(&devs_shutdown.timer, device_shutdown_timeout_handler, 0);
> + devs_shutdown.timer.expires = jiffies + SHUTDOWN_TIMEOUT * HZ;
> + add_timer(&devs_shutdown.timer);
> +}
> +
> +static void device_shutdown_timer_clr(void)
> +{
> + del_timer(&devs_shutdown.timer);
> +}
> +#else
> +static inline void device_shutdown_timer_set(void)
> +{
> +}
> +static inline void device_shutdown_timer_clr(void)
> +{
> +}
> +#endif
> +
> void kernel_restart_prepare(char *cmd)
> {
> blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
> system_state = SYSTEM_RESTART;
> usermodehelper_disable();
> + device_shutdown_timer_set();
> device_shutdown();
> + device_shutdown_timer_clr();

Why isn't all of this done in device_shutdown() if you think it is a
device issue? Why put it in reboot.c?

thanks,

greg k-h