Re: [PATCH v2] reboot: Backup orderly_poweroff

From: Keerthy
Date: Tue Jan 19 2016 - 05:34:08 EST


Hi Ingo,

On Tuesday 19 January 2016 02:36 PM, Ingo Molnar wrote:

* Grygorii Strashko <grygorii.strashko@xxxxxx> wrote:

On 01/15/2016 12:14 PM, Ingo Molnar wrote:

* One Thousand Gnomes <gnomes@xxxxxxxxxxxxxxxxxxx> wrote:

If kernel_power_off() is called then the system should power off. No ifs and
whens.

Even if it doesn't the watchdog should kill it.

That is broken on some platforms on the watchdog side as the
watchdog shuts down during our power off callbacks - because the system
firmware is too stupid to reset the watchdog as it powers back up (so
keeps rebooting).

If you watchdog and firmware function properly you shouldn't even have to
care if you crash during the kernel power off.

That's a good point as well - if the system is 'stuck' for some notion of stuck,
then watchdog drivers can help.


Seems ARM doesn't have endless loop implemented in machine_power_off() - so,
not too much chances for Watchdog to fire.
void machine_power_off(void)
{
local_irq_disable();
smp_send_stop();

if (pm_power_off)
pm_power_off();

--- endless loop ?
--- or restart ?
}
[and even if it will be there - 20-30sec is usual timeout for Watchdog and this
enough time to burn the system in case of thermal emergency poweroff :(]

Here it's unclear whether user-space even called the sys_reboot() system call.


That's true - original log [1] has
Nov 30 11:19:22 [ 5.942769] thermal thermal_zone3: critical temperature reached(108 C),shutting down
[...]
Nov 30 11:19:24 [ 7.387900] ahci 4a140000.sata: flags: 64bit ncq sntf stag pm led clo only pmp pio slum part ccc apst
Nov 30 11:19:24 INIT: Switching to runlevel: 0
Nov 30 11:19:24 INIT: Sending processes the TERM signal

and there are no
[ 220.004522] reboot: Power down


Also, It's not the first time this part of code is discussed (thermal emergency poweroff) [2],
so the good question, as for me, is it really required and safe to use orderly_poweroff() in
case of thermal emergency poweroff ([3] as example)?

In general, this kind of use case can be simulated using SysRq on any arch
- [3.290034] Freeing unused kernel memory: 492K (c0a67000 - c0ae2000)
INIT: version 2.88 booting
Starting udev
^^ The issue most probably might happens when system in the process of loading modules
So, once modules loading process is started - fire Sysrq "poweroff(o)"

So I'd say emergency poweroff should be named accordingly - and the
orderly_poweroff() name suggest anything but an emergency, right?

So I'd be fine with the following:

- introduce a poweroff_emergency() core kernel function call

- use it in drivers where it's justified

- poweroff_emergency() has a configurable timeout value. If the timeout value is
set to 0 then it powers the system off immediately.

Functionally it would be mostly equivalent to your current patch (except the '0'
immediate poweroff functionality).

Thanks for the suggestion. I will work on this and get back.

Best Regards,
Keerthy


Thanks,

Ingo