Re: [Linaro-acpi] [PATCH v8 5/5] Watchdog: introduce ARM SBSA watchdog driver

From: Al Stone
Date: Thu Nov 19 2015 - 18:50:17 EST

Sorry for the delayed response...I've got some difficult family things to work
on IRL that are taking priority...

On 11/12/2015 05:23 PM, Timur Tabi wrote:
> On 11/12/2015 06:06 PM, Al Stone wrote:
>> If it is a NAK, that's fine, but I also want to be sure I understand what the
>> objections are. Based on my understanding of the discussion so far over the
>> multiple versions, I think the primary objection is that the use of pretimeout
>> makes this driver too complex, and indeed complex enough that there is some
>> concern that it could destabilize a running system. Do I have that right?
> I don't have a problem with the concept of pre-timeout per se. My primary
> objection is this code:
>> +static irqreturn_t sbsa_gwdt_interrupt(int irq, void *dev_id)
>> +{
>> + struct sbsa_gwdt *gwdt = (struct sbsa_gwdt *)dev_id;
>> + struct watchdog_device *wdd = &gwdt->wdd;
>> +
>> + /* We don't use pretimeout, trigger WS1 now */
>> + if (!wdd->pretimeout)
>> + sbsa_gwdt_set_wcv(wdd, 0);
> This driver depends on an interrupt handler in order to properly program the
> hardware. Unlike some other devices, the SBSA watchdog does not need assistance
> to reset on a timeout -- it is a "fire and forget" device. What happens if
> there is a hard lockup, and interrupts no longer work?

Aha. I see now. That helps clarify a lot. Thanks.

> The reason why Fu does this is because he wants to support a pre-timeout value
> that's independent of the timeout value. The SBSA watchdog is normally
> programmed where real timeout equals twice the pre-timeout. I would prefer that
> the driver adhere to this limitation. That would eliminate the need to
> pre-program the hardware in the interrupt handler.

The "normally programmed" limitation described is interesting; forgive my
ignorance, but where is that specified? I couldn't find anything that specific
in the SBSA, or the ARM ARM, but I could have missed it. That being said,
keeping them independent at least seems like a good idea; if I think about
kdump/kexec or some other recovery mechanism wanting to perhaps copy part of
RAM or flush a filesystem/database, or maybe do some other magic to recover
enough to be able to reset the timer, that may be a really long interval on a
large server. I could easily see that being very different from a watchdog
timer that's meant to just make sure the platform is still making progress.
Conversely, I could see that recovery interval being very small or zero on
a guest OS, for example, and the watchdog still different.

>> And finally, a simpler, single stage timeout watchdog driver would be a
>> reasonable thing to accept, yes? I can see where that would make sense.
> I would be okay with merging such a driver, and then enhancing it later to add
> pre-timeout support.
>> The issue for me in that case is that the SBSA requires a two stage timeout,
>> so a single stage driver has no real value for me.
> There are plenty of existing watchdog devices that have a two-stage timeout but
> the driver treats it as a single stage. The PowerPC watchdog driver is like
> that. The hardware is programmed for the second stage to cause a hardware
> reset, and the interrupt handler is typically a no-op or just a printk().

Hrm. Thanks for the pointer. I _think_ I see a way to do that with arm64, and
perhaps combine this driver's functionality with what Timur did originally, but
still have it reasonably straightforward. I need to do the experiments, though,
and see if it actually works first.

Al Stone
Software Engineer
Linaro Enterprise Group
