The wdat_wdt driver is misusing the min_hw_heartbeat_ms field. This
field should only be used when the hardware watchdog device should not
be pinged more frequently than a specific period. The ACPI WDAT
"Minimum Count" field, on the other hand, specifies the minimum
timeout value that can be set. This corresponds to the min_timeout
field in Linux's watchdog infrastructure.
Setting min_hw_heartbeat_ms instead can cause pings to the hardware
to be delayed when there is no reason for that, eventually leading to
unexpected firing of the watchdog timer (and thus unexpected reboot).
I'm also changing max_hw_heartbeat_ms to max_timeout for symmetry,
although the use of this one isn't fundamentally wrong, but there is
also no reason to enable the software-driven ping mechanism for the
wdat_wdt driver.
Signed-off-by: Jean Delvare <jdelvare@xxxxxxx>
Fixes: 058dfc767008 ("ACPI / watchdog: Add support for WDAT hardware watchdog")
Cc: Wim Van Sebroeck <wim@xxxxxxxxxxxxxxxxxx>
Cc: Guenter Roeck <linux@xxxxxxxxxxxx>
Cc! Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>
Cc: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
---
Untested, as I have no supported hardware at hand.
Note to the watchdog subsystem maintainers: I must say I find the
whole thing pretty confusing.
First of all, the name symmetry between min_hw_heartbeat_ms and
max_hw_heartbeat_ms, while these properties are completely unrelated,
is heavily misleading. max_hw_heartbeat_ms is really max_hw_timeout
and should be renamed to that IMHO, if we keep it at all.
Secondly, the coexistence of max_timeout and max_hw_heartbeat_ms is
also making the code pretty hard to understand and get right.
Historically, max_timeout was already supposed to be the maximum
hardware timeout value. I don't understand why a new field with that
meaning was introduced, subsequently changing the original meaning of
max_timeout to become a software-only limit... but only if
max_hw_heartbeat_ms is set.
To be honest, I'm not sold to the idea of a software-emulatedThere are watchdogs with very low maximum timeout values, sometimes less than
maximum timeout value above what the hardware can do, but if doing
that makes sense in certain situations, then I believe it should be
implemented as a boolean flag (named emulate_large_timeout, for
example) to complement max_timeout instead of a separate time value.
Is there a reason I'm missing, why it was not done that way?
Currently, a comment in watchdog.h claims that max_timeout is ignoredAs mentioned before, code is hardly ever perfect. Patches to improve the
when max_hw_heartbeat_ms is set. However in watchdog_dev.c, sysfs
attribute max_timeout is created unconditionally, and
max_hw_heartbeat_ms doesn't have a sysfs attribute. So userspace has
no way to know if max_timeout is the hardware limit, or whether
software emulation will kick in for a specified timeout value. Also,
there is no complaint if both max_hw_heartbeat_ms and max_timeout
are set.