Re: /proc or ps tools bug? 2.6.3, time is off

From: George Anzinger
Date: Thu May 06 2004 - 19:37:09 EST


john stultz wrote:
On Mon, 2004-05-03 at 23:12, Tim Schmielau wrote:

On Mon, 3 May 2004, john stultz wrote:

This patch polishes up Tim Schmielau's (tim@xxxxxxxxxxxxxxxxxxxxxx) fix
for jiffies_to_clock_t() and jiffies_64_to_clock_t(). The issues
observed was w/ /proc output not matching up to wall time due to
accumulated error caused by HZ not being exactly 1000 on i386 systems.
The solution is to correct that error by using the more accurate
TICK_NSEC in our calculation.

I wonder whether it's conceptually correct to use jiffies for accurate long-time measurements at all. ntpd is there for a reason. Using both
corrected, accurate and freely running clocks IMHO is calling for trouble. This might be something to think about for 2.7.


Indeed. Moving away from jiffies as a time counter and more of an
interrupt counter is important. That allows for implementations of
variable HZ and other things the high-res timer folks want without
affecting the time keeping code.

Roughly, I'd like to see the time code for all arches in 2.7 to look
like:

u64 system_time /* NTP adjusted nanosecs since boot */
u64 wall_time_offset /* offset to system_time for time of day */
u64 offset_base /* last read raw hw value */

Hm. In 2.6 we use an NTP adjusted wall time and a wall_to_monotonic offset. I don't really see the advantage here. Does this change buy us something?

For what its worth, I introduced the wall_to_monotonic offset just because it was easier to do (and understand, I think) in the current kernel.

ts_read(): returns the raw cycle value from the hardware timesource
(TSC/ACPI PM/HPET)
ts_delta(now, then):
returns the difference between two raw cycle values
ts_cyc2ns(cycles):
converts a cycle value to ns

monotonic_clock(): returns NTP adjusted nanoseconds since boot
ie: system_time +
NTP_GUNK(ts_cyc2ns(ts_delta(ts_read(),offset_base)))
gettimeofday(): returns monotonic_clock() + sys_time_offset
settimeofday(): adjusts only sys_time_offset
time_interrupt_hook(): updates system_time. called by timer interrupt atleast once
every hardware cycle (ie: before the hardware counter
overflows), but otherwise unaffected by lost interrupts, etc.
ie: then = offset_base
now = ts_read()
system_time += NTP_GUNK(ts_cyc2ns(ts_delta(now, then)));
DO_MORE_NTP_GUNK()

And ignoring the magic NTP_GUNK macros, that's all there is to it
(Although don't kid your self, the NTP_GUNK is nasty).

Right, and it needs to be recast to use secs and nanosecs... But you forget the accounting code which needs the periodic interrupt to charge time to whom ever.

Of course, with this approach, we actually have to be able to trust the
hardware 100%. With the current state of i386 hw having serious problems
w/ reliable timesources, this may be difficult.

Yes, and there is also a problem getting a stable, reliable, and correct calibration of TSC to PIT even with a constant TSC rate. In the HRT patch I finally resorted to correcting the TSC last read value on a regular basis. With out this it drifts (or maybe, more correctly, the calibration was wrong) enough to mess up the high res timers.

I suspect that while we use two different timers (PIT & TSC or PIT & pm-timer or ...) that don't use the same source clock we will continue to have such problems. Other archs have a much easier time of it.

I've got a bigger proposal (with proper credits to Keith Mannthey and
George Anzinger for reviews and corrections) that I wrote up awhile
back, and I'll likely send it out if this sketch gathers any interest.


--
George Anzinger george@xxxxxxxxxx
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/