Re: [PATCH v4 2/3] watchdog: introduce watchdog.open_timeout commandline parameter

From: Guenter Roeck
Date: Wed Jan 11 2017 - 06:02:19 EST


On 01/11/2017 12:11 AM, Rasmus Villemoes wrote:
On 2017-01-10 19:08, Guenter Roeck wrote:
On Mon, Jan 09, 2017 at 04:02:32PM +0100, Rasmus Villemoes wrote:

+static unsigned open_timeout;
+module_param(open_timeout, uint, 0644);
+
+static bool watchdog_past_open_deadline(struct watchdog_core_data *data)
+{
+ if (!open_timeout)
+ return false;
+ return time_is_before_jiffies(data->open_deadline);

Doesn't this return true if the time is _before_ the open deadline ?
Should it be time_is_after_jiffies() ?

Yes, time_is_before_jiffies(foo) means foo < jiffies, and that is what we want when we're querying whether we've passed the deadline.

So you want the function to return true if the timeout has _not_ expired ?

+}
+
+static void watchdog_set_open_deadline(struct watchdog_core_data *data)
+{
+ data->open_deadline = jiffies + msecs_to_jiffies(open_timeout);

The open deadline as defined applies to the time after the device was
instantiated, not to the time since boot. Would it be better to make it
"time since boot" ?

I don't have a strong opinion on that, but two small things made me do it this way: (1) In case a hardware watchdog

is somehow hotplugged long after boot and userspace is supposed to detect that and start feeding it, it wouldn't make

sense for the framework not to take care of it for a while. (2) The open_timeout also applies to the case where the

userspace app gracefully closes the watchdog device (e.g. because it's been instructed to restart to load a new configuration

or whatnot) but the hardware cannot be stopped. In that case, the framework also takes over, and the same logic as after boot should apply

- if the app fails to come up again, the framework should not feed the dog indefinitely, but OTOH it clearly doesn't make sense to have a boot-time based deadline.


[ Can you try to work with line wraps ? ]

Uuh, no. I didn't realize that you apply that case to the "userspace app gracefully
closes the watchdog device" case as well. This is clearly a separate use case.

Looks like there are now three use cases for 'open timeout'.
- time after boot
- timer after loading the watchdog device
- time after closing watchdog device and before reopening it

I would have thought the first use case is the important one, and the other ones are,
at best, secondary. Given that, we are clearly not there yet. This will require input
from others on the semantics.

Thanks,
Guenter

Also, are you sure about using milli-seconds (instead of seconds) ?
I can not really imagine a situation where this would be needed
(especially and even more so in the context of using "time after
instantiating").

I went back and forth on this. I did milli-seconds because that should cover more use cases. Yes, wanting an open timeout of .7 seconds with 1.0 seconds being unacceptable is unusual, but I know of some customers with very strict requirements. Also, even if one cannot make userspace start that fast, one can boot with a somewhat generous open_timeout and then write 700 to /sys/module/watchdog/parameters/open_timeout for use in case (2) above.

Thanks,
Rasmus