Re: [non-pretimeout,4/7] Watchdog: introduce ARM SBSA watchdog driver

From: Guenter Roeck
Date: Tue Jun 23 2015 - 11:21:50 EST


On Tue, Jun 23, 2015 at 09:26:35PM +0800, Fu Wei wrote:
> Hi Guenter,
[ ...]

> >
> >> + * When the first timeout occurs, WS0(SPI or LPI) is triggered,
> >> + * the second timeout period(as long as the first timeout period) starts.
> >
> > no longer accurate if WOR is used for the second period.
> >
> >> + * In WS0 interrupt routine, panic() will be called for collecting
> >> + * crashdown info.
> >> + * If system can not recover from WS0 interrupt routine, then second
> >> + * timeout occurs, WS1(reset or higher level interrupt) is triggered.
> >> + * The two timeout period can be set by WOR(32bit).
> >
> > The second timeout period is determined by ...
> >
> >> + * WOR gives a maximum watch period of around 10s at the maximum
> >> + * system counter frequency.
> >> + * The System Counter shall run at maximum of 400MHz.
> >
> > "... at the maximum system counter frequency of 400 MHz.", and drop the
> > last sentence.
>
> For the second timeout period, I have discussed with a kdump developers,
> (1)10s maybe not good enough for all the case of panic + kdump, so
> maybe we still need to use WCV in the second timeout period
> (2)in the second timeout period, maybe we need to programme WCV for
> two reason: a, trigger WS1 to reboot system ASAP; b, feed the watchdog
> without cleanning WS0 flag.
>
> WHY we want to feed the watchdog (keepalive) without cleanning WS0 flag??
> REASON:
> (1)if the system context is large, we may need to feed the dog until
> we get all the things backed up.
> (2)if system goes wrong, WS0 triggered, then panic--> kdump. if we
> feed the dog by WRR or programming WOR, WS0 flag will be cleaned. Once
> system goes wrong again, then panic again.....
> So this system will be in a panic--kdump--panic--kdump loop, have not
> chance to reset.
>
> So if we are in the second timeout period, we may need to always programme WCV.
>
The crashdump kernel is supposed to reload the watchdog driver, which will ping
the watchdog. If it isn't able to do that in 10 seconds, something is wrong.

> >> +
> >> + status = readl_relaxed(gwdt->control_base + SBSA_GWDT_WCS);
> >> + if (status & SBSA_GWDT_WCS_WS1) {
> >> + dev_warn(dev, "System reset by WDT(WCV: %llx)\n",
> >> + sbsa_gwdt_get_wcv(wdd));
> >
> > WCV here only tells us how many clock cycles were executed since the
> > system started (or something like that). So I still don't understand
> > why it is valuable to print that number.
>
> this number provides the time of system reset, I thinks that may help
> admin to analyse the system failure.
>
It doesn't mean anything to anyone but you since it is not in a well defined
time scale. Also, I would be somewhat surprised if WCV would retain its value
on reset. Much more likely it is the time (in clock cycles) since reset.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/