Re: [PATCH] nohz: add missing handling of clocksource watchdog

From: Bartlomiej Zolnierkiewicz
Date: Sat Dec 13 2008 - 09:50:19 EST


On Monday 08 December 2008, Thomas Gleixner wrote:
> On Mon, 8 Dec 2008, Bartlomiej Zolnierkiewicz wrote:
>
> > On Monday 08 December 2008, Sergei Shtylyov wrote:
> > > Hello.
> > >
> > > R. J. Wysocki wrote:
> > > > On Monday, 8 of December 2008, Bartlomiej Zolnierkiewicz wrote:
> > > >
> > > >> Fixes "Clocksource tsc unstable (delta = -974982308 ns)" problem.
> > > >>
> > > >
> > > > Where can I find the description of the problem?
> > > >
> > >
> > > Kernel.org bug #10216 (it's not about this issue though).
> >
> > Looking back at bug #10216 the patch is unlikely to help Lars' system
> > since it doesn't use nohz and there are also warnings about unsynced
> > TSCs (I just don't know why not for 2.6.27-rc kernels)...
> >
> > > > Rafael
> > > >
> > > >
> > > >> [ IDE was unlucky to be initialized at the same time that
> > > >> clocksource watchdog triggers and was blamed for the issue. ]
> > > >>
> > >
> > > I think it might well have been blamed correctly -- the clocksource
> > > watchdog timer gets run every half second.
> >
> > AFAICS this is a special one for measuring stability of the clocksource
> > only -- by comparing the value returned by a given clocksource with the
> > reference clocksource. Normal drivers have no bussiness there...
> >
> > Anyway I forgot to add that it fixes the issue for me under QEMU (on my
> > laptop TSC is unstable due to halts in idle) and it could be as well QEMU's
> > oddity (although it looks like a legit kernel problem). v2 of the patch
> > below (updated patch description).
> >
> > From: Bartlomiej Zolnierkiewicz <bzolnier@xxxxxxxxx>
> > Subject: [PATCH v2] nohz: add missing handling of clocksource watchdog
> >
> > Fixes "Clocksource tsc unstable (delta = -974982308 ns)" problem
> > under QEMU.
> >
> > Cc: Sergei Shtylyov <sshtylyov@xxxxxxxxxxxxx>
> > Cc: Lars Winterfeld <lars.winterfeld@xxxxxxxxxxxxx>
> > Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@xxxxxxxxx>
> > ---
> > kernel/time/tick-sched.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > Index: b/kernel/time/tick-sched.c
> > ===================================================================
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -21,6 +21,7 @@
> > #include <linux/sched.h>
> > #include <linux/tick.h>
> > #include <linux/module.h>
> > +#include <linux/clocksource.h>
> >
> > #include <asm/irq_regs.h>
> >
> > @@ -153,6 +154,7 @@ void tick_nohz_update_jiffies(void)
> > local_irq_restore(flags);
> >
> > touch_softlockup_watchdog();
> > + clocksource_touch_watchdog();
>
> NAK.
>
> If this happens then the watchdog logic did not manage to schedule the
> watchdog timer in time. This patch just papers over the real problem.
>
> We do not fix QEMU problems in the kernel, as we would miss real
> hardware wreckage that way.

I know that and I'm not proposing it. However sometimes things that look
like QEMU specific problems indicate the real deficiencies of the kernel.

I did more debugging and the reason for marking tsc clocksource as an
unstable is that the reference clocksource (acpi_pm) itself is bad.
The similar problems seem to also happen with the real hardware, i.e.
please see:

http://lkml.indiana.edu/hypermail/linux/kernel/0802.3/0589.html

I cooked up a debug patch which uses timer interval as a reference for
checking stability of the watchdog clocksource (I think it could be useful
for debugging similar issues, it is not a generic solution since it doesn't
handle the situation when both watchdog and watched clocksources are bad).

I'm also wondering whether it would be a good idea to modify our clocksource
watchdog code to just always use timer interval as reference for checking
clocksource stability. It will allow us to check all clocksources (including
watchdog one) and should simplify the code a bit. I can make a patch for it
if you think that this is good idea...

---
kernel/time/clocksource.c | 39 +++++++++++++++++++++++++++++++++++++--
1 file changed, 37 insertions(+), 2 deletions(-)

Index: b/kernel/time/clocksource.c
===================================================================
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -79,8 +79,9 @@ static unsigned long watchdog_resumed;
/*
* Interval: 0.5sec Threshold: 0.0625s
*/
-#define WATCHDOG_INTERVAL (HZ >> 1)
-#define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
+#define WATCHDOG_INTERVAL (HZ >> 1)
+#define WATCHDOG_INTERVAL_NSEC 500000000
+#define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)

static void clocksource_ratewd(struct clocksource *cs, int64_t delta)
{
@@ -94,6 +95,34 @@ static void clocksource_ratewd(struct cl
list_del(&cs->wd_list);
}

+static int watchdog_check(struct clocksource *wd,
+ int64_t wd_nsec, int64_t cs_nsec)
+{
+ int64_t delta;
+
+ /* check if watchdog seems good */
+ delta = wd_nsec - WATCHDOG_INTERVAL_NSEC;
+ if (delta > -WATCHDOG_THRESHOLD && delta < WATCHDOG_THRESHOLD)
+ return 0;
+
+ /* check if clocksource seems bad */
+ delta = cs_nsec - WATCHDOG_INTERVAL_NSEC;
+ if (delta < -WATCHDOG_THRESHOLD || delta > WATCHDOG_THRESHOLD)
+ return 0;
+
+ printk(KERN_WARNING "Watchdog clocksource %s unstable (delta = "
+ "%lld ns)\n", wd->name, wd_nsec - WATCHDOG_INTERVAL_NSEC);
+
+ wd->flags &= ~(CLOCK_SOURCE_VALID_FOR_HRES | CLOCK_SOURCE_WATCHDOG);
+ clocksource_change_rating(wd, 0);
+
+ /* disable watchdog */
+ watchdog = NULL;
+
+ /* don't add next timer */
+ return 1;
+}
+
static void clocksource_watchdog(unsigned long data)
{
struct clocksource *cs, *tmp;
@@ -135,6 +164,11 @@ static void clocksource_watchdog(unsigne
} else {
cs_nsec = cyc2ns(cs, (csnow - cs->wd_last) & cs->mask);
cs->wd_last = csnow;
+
+ /* Check reference clock stability first. */
+ if (watchdog_check(watchdog, wd_nsec, cs_nsec))
+ goto out_unlock;
+
/* Check the delta. Might remove from the list ! */
clocksource_ratewd(cs, cs_nsec - wd_nsec);
}
@@ -152,6 +186,7 @@ static void clocksource_watchdog(unsigne
watchdog_timer.expires += WATCHDOG_INTERVAL;
add_timer_on(&watchdog_timer, next_cpu);
}
+out_unlock:
spin_unlock(&watchdog_lock);
}
static void clocksource_resume_watchdog(void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/