Re: [PATCH] hangcheck-timer is broken on x86

From: Joel Becker
Date: Mon Mar 29 2010 - 15:55:34 EST


On Mon, Mar 29, 2010 at 11:44:51AM -0700, john stultz wrote:
> > But if timer interrupt is delayed by more than acpi_pm wrap-around
> > time, then the update_wall_time() is also screwed. Since it is not, we
> > can rely on getrawmonotonic().
>
> Right, if the box hangs for longer then the clocksource can count for,
> the timekeeping subsystem will be off by some multiple of that length.
>
> And That's exactly why I'm advising against using
> gettimeofday/getrawmonotonic or any other software managed sense of time
> for the hangcheck timer, as you won't be able to correctly detect hangs.
>
> I'm also suggesting using something like read_persistent_clock() is
> better, because there is no OS/software management involved (other then
> the minor syncing issue I mentioned before) so if the system hangs for a
> long period of time, then returns, you'll still be able to detect the
> hang.
>
> But maybe what folks are using the hangcheck timer for is shifting, so
> its possible that I'm not quite understanding what you're trying to do
> here.

The people who use hangcheck-timer for the reasons I originally
wrote it absolutely want any hang, including long ones, detected.

Joel

--

"For every complex problem there exists a solution that is brief,
concise, and totally wrong."
-Unknown

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@xxxxxxxxxx
Phone: (650) 506-8127
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/