Re: Time Drift Compensation on Linux Clusters

From: Baruch Even
Date: Thu Mar 03 2005 - 11:31:57 EST


Dror Cohen wrote:
Hi all,
While working on a Linux cluster with kernel version 2.4.27 we've
encountered a consistent clock drift problem. We have devised a fix
for this problem which is based on the Pentium's TSC clock. We'd
appreciate any comments on the validity of the proposed solution and
on whether it might cause unexpected (and undesired) problems in other
parts of the kernel.

That's an interesting solution to a problem I've seen in the past.

An issue with submission to LKML, you'd better submit a full patch in unified diff format with context, the best command for this is:
diff -Nurp kernel-old kernel-new

The possible problem from jumping multiple jiffies is that calculations might be skewed off, some are likely to be done based on the assumption of a single jiffie increase and some by looking at the difference of jiffies from some time in the past.

For example, the call immediately after the increment of jiffies assumes a single jiffie passed, but it is not necessarily true. I do not know what effects this might have probably nothing that didn't happen with the clock skew anyway.

What happens to the TSC when the CPU is halted due to idleness? I don't remember exactly but there are cases when it stops ticking and then your patch will not advance the clock at all.

You also advance the clock once too much, the condition should probably be (xtdc_last + xtdc_step < cycles).

There is also the issue of a loop with 64 bit operations/comparisons in the timer. Maybe this should be offloaded to the bh? The cost should be better thought of.

Baruch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/