Re: [PATCH v4 0/2] Add stop_on_panic support for watchdog

From: Ahmad Fatoum
Date: Wed Mar 05 2025 - 07:24:57 EST


Hello George,

On 05.03.25 13:15, George Cherian wrote:
>> On 05.03.25 12:28, George Cherian wrote:
>>>> that can't be disabled and would protect against system lock up:
>>>> Consider a memory-corruption bug (perhaps externally via DMA), which partially
>>>> overwrites both main and kdump kernel. With a disabled watchdog, the system
>>>> may not be able to recover on its own.
>>>
>>> Yes, that is the reason why the kernel command-line is optional and by default it is set to zero.
>>> So that in cases if you have a corrupted kdump kernel then watchdog kicks in.
>>
>> The existing option isn't enough for the kdump kernel use case.
>> If we (i.e. you) are going to do something about it, wouldn't it be
>> better to have a solution that's applicable to a wider number of
>> watchdog devices?
>
> I need a slight clarification here.
> 1. reset_on_panic takes the number of seconds to be reloaded to watchdog HW, so that it initiates a
> watchdog reset after the specified timeout, if kdump kernel fails to boot or hung while booting.

Yes.

> 2. in case reset_on_panic = 0 then it behaves like stop_on_panic=1.
> Is this what you meant?

Alternatively, reset_on_panic = 0 could also mean stopping the watchdog as
you do now. I haven't thought though yet what would make the most sense.

> I would let Guenter comment on this approach.

+1.

>> If you are serious with the watchdog use, you'll want to use the watchdog to
>> monitor kernel startup as well. If the bootloader can set a watchdog timeout
>> just before starting the kernel and it doesn't expire before the kernel watchdog
>> driver takes over, why can't we do the same just before starting the dumpkernel?
>
> Yes, in an ideal world with ideal HW. But there are HW with issues which cannot have large
> enough Watchdog time. Such HW would boot from FW to kernel without watchdog enabled.
> And stop_on_panic does the similar for kdump kernel too.

Yes, but there is likely more kinds of watchdog devices that can not be disabled,
so it makes sense to have a solution that is more broadly applicable from the get-go.

Cheers,
Ahmad

>
> -George
>>
>> Thanks,
>> Ahmad
>>
>>
>>>
>>> Thanks,
>>> Ahmad
>>>
>>>>
>>>>
>>> Changelog:
>>> v1 -> v2
>>> - Remove the per driver flag setting option
>>> - Take the parameter via kernel command-line parameter to watchdog_core.
>>>
>>> v2 -> v3
>>> - Remove the helper function watchdog_stop_on_panic() from watchdog.h.
>>> - There are no users for this.
>>>
>>> v3 -> v4
>>> - Since the panic notifier is in atomic context, watchdog functions
>>> which sleep can't be called.
>>> - Add an options flag WDIOF_STOP_MAYSLEEP to indicate whether stop
>>> function sleeps.
>>> - Simplify the stop_on_panic kernel command line parsing.
>>> - Enable the panic notiffier only if the watchdog stop function doesn't
>>> sleep
>>>
>>> George Cherian (2):
>>> watchdog: Add a new flag WDIOF_STOP_MAYSLEEP
>>> drivers: watchdog: Add support for panic notifier callback
>>
>> - George
>
>


--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |