Re: [PATCH] Improve clocksource unstable warning

From: john stultz
Date: Fri Nov 12 2010 - 16:32:06 EST


On Wed, Nov 10, 2010 at 2:16 PM, Andy Lutomirski <luto@xxxxxxx> wrote:
> When the system goes out to lunch for a long time, the clocksource
> watchdog might get false positives.  Clarify the warning so that
> people stop blaming their system freezes on the timing code.

So this issue has also crept up at times in the -rt kernel, where some
high priority rt tasks hogs the cpu long enough for the watchdog
clocksource to wrap, and then the TSC seems to have sped up relative
to the watchdog when the system returns to a normal state and causes a
false positive, so the TSC gets kicked out.

So the watchdog heuristic is rough, since it depends on three things:
1) The watchdog clocksource is actually trustworthy
2) The watchdog actually runs near to when it expects to run.
3) When reading the different clocksources, any interruptions when the
watchdog runs are small enough to to be handled by the allowed margin
of error.

And in these cases we're breaking #2

So I had a few ideas to improve the huristic somewhat (and huristic
refinement is always ugly). So first of all, the whole reason for the
watchdog code is the TSC. Other platforms may want to make use of it,
but none do at this moment.

The majority of TSC issues are caused when the TSC halts in idle
(probably most common these days) or slows down due to cpufreq
scaling.

When we get these false positives, its because the watchdog check
interval is too long and the watchdog hardware has wrapped. This makes
it appear that the TSC is running too *fast*. So we could try to be
more skeptical of watchdog failures where the TSC appears to have sped
up, rather then the more common real issue of it slowing down.

Now the actual positive where this could occur is if you were on an
old APM based laptop, and booted on battery power and the cpus started
at their slow freq. Then you plugged in the AC and the cpus jumped to
the faster rate. So while I suspect this is a rare case these days (on
ACPI based hardware the kernel has more say in cpufreq transitions, so
I believe we are unlikely to calculate tsc_khz while the cpus are in
low power mode), we still need to address this.

Ideas:
1) Maybe should we check that we get two sequential failures where the
cpu seems fast before we throw out the TSC? This will still fall over
in some stall cases (ie: a poor rt task hogging the cpu for 10
minutes, pausing for a 10th of a second and then continuing to hog the
cpu).

2) We could look at the TSC delta, and if it appears outside the order
of 2-10x faster (i don't think any cpus scale up even close to 10x in
freq, but please correct me if so), then assume we just have been
blocked from running and don't throw out the TSC.

3) Similar to #2 we could look at the max interval that the watchdog
clocksource provides, and if the TSC delta is greater then that, avoid
throwing things out. This combined with #2 might narrow out the false
positives fairly well.

Any additional thoughts here?

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/