Re: [PATCH] hangcheck-timer is broken on x86

From: john stultz
Date: Mon Mar 29 2010 - 14:45:01 EST


On Mon, 2010-03-29 at 13:04 -0400, Yury Polyanskiy wrote:
> On Mon, 29 Mar 2010 09:43:27 -0700
> john stultz <johnstul@xxxxxxxxxx> wrote:
>
> > > I am not sure which archs do you mean. But in any case,
> > > getrawmonotonic() is not just a wrap around a call to rdtsc() (or acpi
> > > pm timer read). It is based on the clock->raw_time, which is updated
> > > every timer interrupt by the update_wall_time(). So even if underlying
> > > timer wraps, it doesn't lead to getrawmonotonic() returning 0 sec.
> >
> > What I'm saying is that if you're using getrawmonotonic() to detect
> > hangs, you might miss them, as getrawmonotonic may wrap (and thus stop
> > continually increasing) if the timer interrupt is delayed. This does not
> > apply to systems using the TSC clocksource, but does apply to systems
> > using the acpi_pm.
>
> But if timer interrupt is delayed by more than acpi_pm wrap-around
> time, then the update_wall_time() is also screwed. Since it is not, we
> can rely on getrawmonotonic().

Right, if the box hangs for longer then the clocksource can count for,
the timekeeping subsystem will be off by some multiple of that length.

And That's exactly why I'm advising against using
gettimeofday/getrawmonotonic or any other software managed sense of time
for the hangcheck timer, as you won't be able to correctly detect hangs.

I'm also suggesting using something like read_persistent_clock() is
better, because there is no OS/software management involved (other then
the minor syncing issue I mentioned before) so if the system hangs for a
long period of time, then returns, you'll still be able to detect the
hang.

But maybe what folks are using the hangcheck timer for is shifting, so
its possible that I'm not quite understanding what you're trying to do
here.

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/