Re: [PATCH v2] reboot: Add timeout for device shutdown
From: Khasnis Soumya
Date: Thu Jun 06 2024 - 05:04:11 EST
On Wed, May 29, 2024 at 02:51:48PM +0200, Greg KH wrote:
> On Wed, May 29, 2024 at 11:00:49AM +0000, Soumya Khasnis wrote:
> > The device shutdown callbacks invoked during shutdown/reboot
> > are prone to errors depending on the device state or mishandling
> > by one or more driver.
>
> Why not fix those drivers? A release callback should not stall, and if
> it does, that's a bug that should be fixed there.
Yes, the drivers must be fixed. But currently there is no good mechanism
which detects such stalls and there by resulting in a system hang. This
serves as a fail-safe mechanism to prevent such stalls at reboot/shutdown.
>
> Or use a watchdog and just reboot if that triggers at shutdown time.
The hard or software watchdog enabled by config_lockup_detector couldn’t
detect the cases when stalled on IO wait (wait_for_completion/io)
>
> > In order to prevent a device hang in such
> > scenarios, we bail out after a timeout while dumping a meaningful
> > call trace of the shutdown callback which blocks the shutdown or
> > reboot process.
>
> Dump it where?
Dump to kernel log and there by the console. Will update the commit message
>
>
> >
> > Signed-off-by: Soumya Khasnis <soumya.khasnis@xxxxxxxx>
> > Signed-off-by: Srinavasa Nagaraju <Srinavasa.Nagaraju@xxxxxxxx>
> > ---
> > drivers/base/Kconfig | 15 +++++++++++++++
> > kernel/reboot.c | 46 +++++++++++++++++++++++++++++++++++++++++++-
> > 2 files changed, 60 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> > index 2b8fd6bb7da0..d06e379b6281 100644
> > --- a/drivers/base/Kconfig
> > +++ b/drivers/base/Kconfig
> > @@ -243,3 +243,18 @@ config FW_DEVLINK_SYNC_STATE_TIMEOUT
> > work on.
> >
> > endmenu
> > +
> > +config DEVICE_SHUTDOWN_TIMEOUT
> > + bool "device shutdown timeout"
> > + default n
>
> That is the default, so no need for this.
Was mindful of the “slow” devices in the system with long IO wait times.
Will have to consider increasing DEVICE_SHUTDOWN_TIMEOUT_SEC and make this
a default y
>
>
> > + help
> > + Enable timeout for device shutdown. Helps in case device shutdown
> > + is hung during shoutdonw and reboot.
> > +
> > +
> > +config DEVICE_SHUTDOWN_TIMEOUT_SEC
> > + int "device shutdown timeout in seconds"
> > + default 5
> > + depends on DEVICE_SHUTDOWN_TIMEOUT
> > + help
> > + sets time for device shutdown timeout in seconds
>
> You need much more help text for all of these.
Will update the commit with more help text
>
> And why are these in the drivers/base/Kconfig file? It has nothing to
> do with "devices", or the driver core, it's all core kernel reboot
> logic.
This is handling only device shutdown, not complete reboot part,
so we want to keep it here.
>
>
> > diff --git a/kernel/reboot.c b/kernel/reboot.c
> > index 22c16e2564cc..8460bd24563b 100644
> > --- a/kernel/reboot.c
> > +++ b/kernel/reboot.c
> > @@ -18,7 +18,7 @@
> > #include <linux/syscalls.h>
> > #include <linux/syscore_ops.h>
> > #include <linux/uaccess.h>
> > -
> > +#include <linux/sched/debug.h>
>
> Why remove the blank line?
Will fix it.
>
> > /*
> > * this indicates whether you can reboot with ctrl-alt-del: the default is yes
> > */
> > @@ -48,6 +48,14 @@ int reboot_cpu;
> > enum reboot_type reboot_type = BOOT_ACPI;
> > int reboot_force;
> >
> > +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> > +struct device_shutdown_timeout {
> > + struct timer_list timer;
> > + struct task_struct *task;
> > +} devs_shutdown;
> > +#define SHUTDOWN_TIMEOUT CONFIG_DEVICE_SHUTDOWN_TIMEOUT_SEC
> > +#endif
>
> #ifdefs should not be in .c files, please put this in a .h file where it
> belongs. Same for the other #ifdefs.
Will fix it
>
>
>
> > +
> > struct sys_off_handler {
> > struct notifier_block nb;
> > int (*sys_off_cb)(struct sys_off_data *data);
> > @@ -88,12 +96,46 @@ void emergency_restart(void)
> > }
> > EXPORT_SYMBOL_GPL(emergency_restart);
> >
> > +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> > +static void device_shutdown_timeout_handler(struct timer_list *t)
> > +{
> > + pr_emerg("**** device shutdown timeout ****\n");
>
> What does this have to do with "devices"? This is a whole-system issue,
> or really a "broken driver" issue.
Shutdown/reboot procedure involves quite a few routines. We are specifically
dealing with a broken driver or device malfunction.
>
> > + show_stack(devs_shutdown.task, NULL, KERN_EMERG);
>
> How do you know this is the 'device shutdown' stack? What is a "device
> shutdown"?
The timeout handler is invoked in the context of reboot/shutdown process.
We are interested in the device shutdown callback which was stuck and
preventing the reboot/shutdown. By device shutdown, we refer to device
class/bus/driver shutdown calls.
>
> > + if (system_state == SYSTEM_RESTART)
> > + emergency_restart();
> > + else
> > + machine_power_off();
> > +}
> > +
> > +static void device_shutdown_timer_set(void)
> > +{
> > + devs_shutdown.task = current;
>
> It's just the normal shutdown stack/process, why call it a device?
We are specifically dealing with a broken driver or device malfunction,
So to indicate same, we want to name it as device_shutdown stack.
>
> > + timer_setup(&devs_shutdown.timer, device_shutdown_timeout_handler, 0);
> > + devs_shutdown.timer.expires = jiffies + SHUTDOWN_TIMEOUT * HZ;
> > + add_timer(&devs_shutdown.timer);
> > +}
> > +
> > +static void device_shutdown_timer_clr(void)
> > +{
> > + del_timer(&devs_shutdown.timer);
> > +}
> > +#else
> > +static inline void device_shutdown_timer_set(void)
> > +{
> > +}
> > +static inline void device_shutdown_timer_clr(void)
> > +{
> > +}
> > +#endif
> > +
> > void kernel_restart_prepare(char *cmd)
> > {
> > blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
> > system_state = SYSTEM_RESTART;
> > usermodehelper_disable();
> > + device_shutdown_timer_set();
> > device_shutdown();
> > + device_shutdown_timer_clr();
>
> Why isn't all of this done in device_shutdown() if you think it is a
> device issue? Why put it in reboot.c?
Will consider moving to device_shutdown()..
>
> thanks,
>
> greg k-h
Thanks and Regards,
Soumya.