Re: Losing too many ticks! TSC cannot be used as time sourc

From: Petr Vandrovec
Date: Wed Nov 19 2003 - 18:13:45 EST


On 19 Nov 03 at 14:37, john stultz wrote:
> On Wed, 2003-11-19 at 03:21, Petr Vandrovec wrote:
> > Hi,
> > today morning my kernel (2.6.0-test9-something) said that
> > TSC became unusable. There are no other messages around this
> > time in the log. It could be thermal throttling, but it should
> > print some message when X86_MCE_P4THERMAL is enabled, yes?
> > It happened after 17hrs 56min of uptime. System never produced
> > this message before.
>
> Hmmm. Interesting. You haven't seen it before and you've been running
> 2.6.0-testX for awhile?

I do not know when this monitoring entered kernel, but system is running
2.6.0-testX kernels since July 22, when it booted 2.6.0-test1-c1534...

> Any heavy cron jobs running at that time?

atsar says:
cpu %usr %sys %nice %idle pswtch/s runq nrproc lavg1 ..5 ..15
08:30:01 all 3 1 0 95 134 3 125 0.03 0.03 0.00
08:40:01 all 3 1 0 95 108 4 125 0.07 0.05 0.01
08:50:01 all 3 1 0 95 83 4 123 0.03 0.03 0.00

High process switch rate is caused by running teletext grabber. There is
peak at 6:30-6:50 caused by cron.daily.

> The message is caused after 100 consecutive timer ticks where it appears
> that the system has lost ticks. The assumption is that something has
> gone wrong and we can no longer trust the TSC as a time source
> (speedstep, for instance). Thermal throttling is a possibility, but I've
> not actually seen it occur. I'd have to defer to the cpufreq folks on
> that one.

bash# grep FREQ .config
# CONFIG_CPU_FREQ is not set
bash#

> If we're getting false positives, we may have to bump that
> 100-consecutive-ticks number up.
>
> Anything else quirky about the system?

I'm not aware of any problem. Three or four months ago I reported memory
corruption in one of bitkeeper files (32bits replaced with something
else), but it did not occured again since then.

> > Are there some more information I could supply, or should I
> > simple live with fact that TSC stopped working on my P4 today
> > morning?
>
> Well, the TSC didn't stop working, we just stopped using it as a time
> source. Things should work fine using just the PIT, although I'd be
> interested to hear if ntpd finally settled down after the change. If it
> cannot stay synced it may be an issue w/ your system's PIT.

It did not produce any message since 9:54 (>14 hrs ago). Only difference
is that in the past /var/lib/ntp/ntp.drift listed value 4.496 to 4.967
(looking at all values I have logged in /var/log/syslog), but now
it says 45.662. But what I can expect from PIT...

> You may also want to check the ACPI PM timesource code in -mm4, and see
> how that works for you.

Sometime ;-) Maybe if it will hit Linus kernel ;-)
Petr Vandrovec


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/