Re: [FALSE ALARM] Re: HPET (?) related hangs and breakage in 2.6.35,36

From: Andrew Lutomirski
Date: Wed Nov 10 2010 - 15:56:36 EST


On Wed, Nov 10, 2010 at 3:52 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Wed, 10 Nov 2010, Andrew Lutomirski wrote:
>
>> On Wed, Nov 10, 2010 at 1:50 PM, Borislav Petkov <bp@xxxxxxxxx> wrote:
>> > On Wed, Nov 10, 2010 at 01:48:00PM -0500, Andrew Lutomirski wrote:
>> >> > Clocksource: tsc unstable (delta = -34355296774 ns)
>> >> > Switching: to clocksource hpet
>> >>
>> >> Please disregard -- this is a bug in nouveau (or drm) not hpet.  I'll
>> >> send a bug report to the maintainers.
>> >
>> > Interesting! Joerg was complaining about similar symptoms with .36 today
>> > too.
>>
>> Well, there is a clocksource sort-of-bug that could cause confusion:
>> when something totally unrelated to clocksources goes out to lunch,
>> the clocksource watchdog decides that the clocksource is unstable and
>> complains, steering everyone toward filing the wrong bug.
>
> How should the clocksource watchdog code know that something went to
> lunch? The fact that we need to monitor TSC at all is horrible enough,
> adding further heuristics to detect extended lunch breaks would be
> just a PITA.

Could the clocksource watchdog detect when it gets woken up (i.e. when
the hardware timer fires) instead of when it gets scheduled? It is
internal to the timing code, after all...

Alternatively, maybe the watchdog could just compare the TSC timestamp
to the current value according to some other clock (PIT? whatever
clockevent is in use?) instead of just using the time passed into the
delayed work in the first place.

>
> Maybe we could print a different warning when we see large negative
> deltas, which is the main indicator for the system being stuck for
> quite a time while TSC advances happily.

That would be an easy fix.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/