Re: Time Drift Compensation on Linux Clusters

From: Andi Kleen
Date: Thu Mar 03 2005 - 12:42:04 EST


Dror Cohen <dror.xiv@xxxxxxxxx> writes:

> Hi all,
> While working on a Linux cluster with kernel version 2.4.27 we've
> encountered a consistent clock drift problem. We have devised a fix

The normal fix is to use NTP.

The clock drift problem on lost ticks is known, but I don't
think your change is the right fix.

>
> In IO intensive environments a many CPU cycles are spent handling interrupts.
> This is done by the device driver code for the different IO devices.
> Typically, while
> dealing with the hardware, device drivers block other IRQs, including the PIT's
> (timer) IRQ.

Drivers are not supposed to block interrupts for a long time.
If they do in your environment it would be better to fix this.

> Each time a time interrupt is missed the system looses one jiffy.
> These lost jiffies
> accumulate into a consisting clock drift.

It doesn't. jiffies is not used in gettimeofday(), only xtime and
wall_jiffies is. Fixing up jiffies would at best help you against a
"uptime drift" and make the kernel internal timers a bit more accurate.

Did you really test your code? I bet it doesn't help.

In general using the TSC like this is also dangerous, because
there are system where its rate is not stable (e.g. it can vary
when the CPU changes to lower frequencies for power saving)

BTW when you submit any patches do it in diff -u format and ready
to apply. Your form is very reviewer and user unfriendly.

> > #ifdef X86_TSC_DRIFT_COMPENSATION
> > __u64 cycles = get_cycles();
> > while (xtdc_last < cycles) {
> > (*(unsigned long *)&jiffies)++;

This is also wrong because jiffies is really a 64bit variable.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/