Re: [PATCH RFC] clocksource: Detect a watchdog overflow

From: Gratian Crisan
Date: Thu Apr 07 2016 - 05:49:20 EST



John Stultz writes:

> On Tue, Mar 15, 2016 at 11:50 AM, Gratian Crisan <gratian.crisan@xxxxxx> wrote:
>> The clocksource watchdog can falsely trigger and disable the main
>> clocksource when the watchdog wraps around.
>>
>> The reason is that an interrupt storm and/or high priority (FIFO/RR) tasks
>> can preempt the timer softirq long enough for the watchdog to wrap around
>> if it has a limited number of bits available by comparison to the main
>> clocksource. One observed example is on a Intel Baytrail platform where TSC
>> is the main clocksource, HPET is disabled due to a hardware bug and acpi_pm
>> gets selected as the watchdog clocksource.
>>
>> Calculate the maximum number of nanoseconds the watchdog clocksource can
>> represent without overflow and do not disqualify the main clocksource if
>> the delta since the last time we have checked exceeds the measurement
>> capabilities of the watchdog clocksource.
>
> Sorry for not getting back to you sooner on this. You managed to send
> these both out while I was at a conference and on vacation, and so
> they were deep in the mail backlog. :)

No worries, I'm actually "on the road" this week too (ELC). I appreciate
the reply.

> So I'm sympathetic to this issue, because I remember seeing similar
> problems w/ runaway SCHED_FIFO tasks w/ PREEMPT_RT.

Yeah, a runaway rt thread can easily do it. That's just bad design. In
our case it was a bit more subtle bc. it was a combination of high
priority interrupts and rt threads that would occasionally stack up to
delay the timer softirq long enough to cause the watchdog wrap.

> However, its really difficult to create a solution without opening new
> cases where bad clocksources will be mis-identified as good (which
> your solution seems to suffer as well, measuring the time past with a
> known bad clocksource can easily result in large deltas, which will be
> ignored if the watchdog has a short interval).

Fair point. Ultimately you have to trust one of the clocksources. I
guess I was naive in thinking that the main clocksource can't drift more
than what the watchdog clocksource can measure within the
WATCHDOG_INTERVAL. I'm glad I don't have to deal with hardware that
lobotomized.

Would a simple solution that exposes the config option for the
clocksource wathchdog[1] (and defaults it to on) be an acceptable
alternative? It will work for us because we test the stability of the
main clocksource - part of the hardware bring-up.

> A previous effort on this was made here, and there's a resulting
> thread that didn't come to resolution:
> https://lkml.org/lkml/2015/8/17/542

Sorry I've missed it.

> Way back I had tried to come up with an approach where if the time
> delta was large, it was divided by the watchdog interval, and then we
> just compared the remainder with the current watchdog delta to see if
> they matched (closely enough). Unfortunately this didn't work out for
> me then, but perhaps it deserves a second try?

I've entertained that idea too but I think I was trying to optimize
things too early and do everything with the mult/shift math. That first
attempt failed but I do need to try harder because it would be a better
general solution.

> thanks
> -john

Thanks,
-Gratian

[1]