Re: Linux timekeeping plans

Colin Plumb (
Fri, 4 Dec 1998 06:59:27 -0700 (MST)

I wrote:
>> Slaving the clock does require some care, but because the interrupt
>> latency situation inside a single box is not nearly as messy as
>> internet delays that NTP deals with, the algorithms aren't as
And Stefan Monnier replied:
> I must say (as an NTP user) that I don't understand: why don't you just
> allow xntpd to use the RTC as a local clock and let xntpd slave ths system
> clock with it ?

Um, because the RTC is inaccurate. I want NTP to correct it into a good
clock before I use it.

This has advantages even on isolated systems where NTP doesn't make sense.
On laptops and other machines regularly rebooted, you can remember the
RTC correction and then keep stable time across reboots.

>> The one thing that I'm still stuck on is SMP issues. While it appears
>> that the TSC counters are kept synchronized by current SMP PC hardware,
>> this is not guaranteed,

> I seem to remember messages on this list a few weeks back stating fairly
> clearly that several parts of Linux already rely critically on all TSC being
> synchonized (and thus Linux ensuring that it is the case).

So I've been told. But then Poul-Henning Kemp tells me otherwise, and
I'm inclined to believe him, because he's studied the issue in detail.
He says they're synchronous (count at the same rate), but may
have an offset. Not all processors come out of reset at the same
time, and some BIOSes write to the TSC during bootup.

Exciting. And getting them synchronized is a bit exciting, too.

First you have to find the clock difference between master and slave.
This is done by ping-ponginng a spinlock and sampling the timestamp
on each side of it. Basically:

volatile master_lock = 1, slave_lock = 0;


for (i = 0; i < 100; i++) {
/* Send to slave */
while (slave_lock) /* Wait for slave to start spinning */
nop(); /* Wait a moment for slave to really start spinning */
master_time[2*i] = rdtsc();
master_lock = 1;

/* Receive from slave */
master_lock = 0; /* Announce that we're spinning */
while (!slave_lock)
master_time[2*i+1] = rdtsc();

for (i = 0; i < 100; i++) {
/* Receive from master */
slave_lock = 0; /* Announce that we're spinning */
while (!master_lock)
slave_time[2*i] = rdtsc();

/* Send to master */
while (master_lock) /* Wait for master to start spinning */
nop(); /* Wait a moment for master to really start spinning */
slave_time[2*i+1] = rdtsc();
slave_lock = 1;

Now, consider the minimum of the even deltas
(slave_time[2*i] - master_time[2*i]), and the minimum of the odd deltas
(master_time[2*i+1] - slave_time[2*i+1]). Like NTP over networks, the
minimum time is the most stable because nothing unexpected happened and
the hardware worked at full speed. If the clocks are in sync, these minima
will be the same. If they aren't, then half of the difference between them
is the amount of clock skew to be corrected.

They're likely to not be. In that case, the slave's TSC timer needs
correcting. But writing to the TSC is fraught with hazards, because
you really want to add a correction to it, but you don't know how
many clock cycles the read-modify-write operation takes. So you need
to measure *that* as well.

This is done by starting with an initial estimate of how long it takes.
You add this correction to the delta in the read-modify-write.
Then you measure the error again. The residual error is error in
your delta estimate, so you need to correct your delta estimate
and then use that to apply the correction. I.e. the total correction
is initial_delta_estimate + 2 * residual_error.

You can repeat if necessary to get things exactly balanced.

(Oh, for what it's worth, multiprocessor Alphas do *not* have
synchronous clocks. Each processor has its own independent oscillator.
the oscillators aren't crystals, either, but less stable and make
excellent thermometers.)


- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to Please read the FAQ at