Re: [PATCH] hangcheck-timer is broken on x86

From: john stultz
Date: Mon Mar 29 2010 - 17:43:55 EST


On Mon, 2010-03-29 at 17:08 -0400, Yury Polyanskiy wrote:
> >> > What I'm saying is that if you're using getrawmonotonic() to detect
> >> > hangs, you might miss them, as getrawmonotonic may wrap (and thus stop
> >> > continually increasing) if the timer interrupt is delayed. This does not
> >> > apply to systems using the TSC clocksource, but does apply to systems
> >> > using the acpi_pm.
> >>
> >> But if timer interrupt is delayed by more than acpi_pm wrap-around
> >> time, then the update_wall_time() is also screwed. Since it is not, we
> >> can rely on getrawmonotonic().
> >
> > Right, if the box hangs for longer then the clocksource can count for,
> > the timekeeping subsystem will be off by some multiple of that length.
> >
>
> Oh, I see. You mean that getrawmonotonic() wouldn't work under
> abnormal conditions. I understand now, sorry for the confusion. You
> are correct, of course.

And something else I thought of, while the TSC won't wrap, the
multiplication done to convert to nanoseconds will overflow when you hit
a large enough cycle delta. So even TSC systems are not guaranteed to
have timekeeping (and thus getrawmonotonic) work over infinite time
without accumulation.

We try to establish this length via timekeeping_max_deferment(), so that
we make sure we don't go into tickless mode for longer then the
clocksource can handle.


> I personally don't like the idea of relying on read_persistent_clock()
> not only because of hwclock and ntp. In fact, my core interest in
> hangcheck-timer is to set a very low margin (1 to 3 jiffies for
> example) so that I would get a log message upon any kernel slow down
> or a tick-miss (as a hardware integrity check). I don't think
> read_persistent_clock() is precise enough for this purpose, is it?

read_persistent_clock is a bit coarse, so for small intervals it would
not do. However, the current timeout range for the hangcheck timer is in
seconds, which should be fine for read_persistent_clock().

You might also have some trouble with small intervals. Since things like
tickless systems or other advanced power-savings systems might try to
collate or push timers together to save battery. So ticks may be delayed
a small amount (timers are only guaranteed to fire AFTER the time
specified, there really is no promised bound on how late they may be).

Additionally, on -rt systems, you might have higher priority FIFO tasks
blocking the hangcheck timer from executing for a smallish amount of
time.


> Also, hooking to ntp update code complicates an otherwise simple
> driver. I propose to simply check on non-S390 if the clock source
> resolves to something other than TSC and dump a warning message on
> driver load (something like "Hangcheck: kernel using clocksource %s,
> which is not reliable for hang detection").

That requires the hangcheck code to parse the current clocksource, which
might change as the system runs, so it also has to track the clocksource
over time. So I'm not sure its that much easier of a solution.

Something to also consider might also be to look at the softlockup
watchdog, which is fairly similar but somewhat more deeply integrated
into the kernel. Maybe some of this could be merged?

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/