Re: [PATCH 2/8] watchdog: Introduce hardware maximum timeout in watchdog core

From: Uwe Kleine-König
Date: Wed Aug 05 2015 - 04:23:07 EST


Hello Guenter,

On Tue, Aug 04, 2015 at 09:03:27AM -0700, Guenter Roeck wrote:
> On 08/04/2015 08:52 AM, Uwe Kleine-König wrote:
> >On Tue, Aug 04, 2015 at 08:31:43AM -0700, Guenter Roeck wrote:
> >>On 08/04/2015 05:18 AM, Uwe Kleine-König wrote:
> >>>On Mon, Aug 03, 2015 at 07:13:28PM -0700, Guenter Roeck wrote:
> >>>>structure. If the configured timeout exceeds half the value of the
> >>>>maximum hardware timeout, the watchdog core enables a timer function
> >>>>to assist sending keepalive requests to the watchdog driver.
> >>>I don't understand why you want to halve the maximum hw-timeout. If my
> >>>watchdog has hw-max-timeout = 5s and userspace sets it to 3s there
> >>>should be no need for assistance?! I think the implementation is the
> >>>other way round?
> >>>
> >>It is supposed to reflect the _maximum_ timeout. That is different to
> >>the time between heartbeats, which is supposed to be less; using half
> >>the value of the maximum hardware timeout seemed to be a safe number.
> >Right, I got that. With hw-max-timeout = 5s the machine resets after 5s
> >not caring for the device. And so pinging repeatedly after 2.5s is fine.
> >But if userspace sets a timeout of 3s (probably with the intention to
> >ping with a frequency of 1/1.5s) there is no need for worker-assistance,
> >because the pings coming in each 1.5s provided by userspace are good
> >enough.
> >
> Yes, that is how it is supposed to work.
So for the changelog you want:

If the configured timeout exceeds the maximum hardware timeout
the watchdog core enables a timer function ...

right?

> >>>>+static inline bool watchdog_need_worker(struct watchdog_device *wdd)
> >>>>+{
> >>>>+ unsigned int hm = wdd->max_hw_timeout_ms;
> >>>>+ unsigned int m = wdd->max_timeout * 1000;
> >>>>+
> >>>>+ return watchdog_active(wdd) && hm && hm != m &&
> >>>>+ wdd->timeout * 500 > hm;
One problem with the worker I see is that the reset will probably be
delayed with your worker. Consider userspace sets timeout = 10 s because
if the main application doesn't work for 12 s something dangerous can
happen. (Consider a guillotine where the blade can only be hold up for
12 s when not locked. :-) Now if the hw-max-timeout is 9s you setup a
timer to ping at $last_keepalive + 4.5 s and $last_keepalive + 9 s (not
taking timer and system latency into account). That means the system
only resets 18 s after the last userspace ping. Oops.

So ideally you send the last auto-ping at $last_keepalive +
$configured_timeout - $hw-max-timeout (assuming the hardware is
configured for $hw-max-timeout).

> >>>I don't understand what max_timeout is now that there is max_hw_timeout.
> >>>So I don't understand why you need hm != m either.
> >>>
> >>
> >>Backward compatibility. A driver which does not set max_hw_timeout_ms,
> >>or sets both to the same value, by definition expects to handle everything
> >>internally, and thus no worker is configured.
> >And a driver that does
> >
> > max_timeout = 5
> > max_hw_timeout = 5125
> >
> >falls through the cracks.
> >
> Hmm - not that this configuration makes any sense, but you are right.
> I'll make it "hm < m".
It does not? What do you expect max_timeout to be set to if the maximal
hw-timeout is 5125 ms? 0 would work, but IMHO you need some more
documentation then.

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-König |
Industrial Linux Solutions | http://www.pengutronix.de/ |
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/