PROBLEM: clock warps +/- 4294 seconds on 2.2.7 w/SMP x86

dave madden (dhm@webvision.com)
Wed, 12 May 1999 21:26:44 -0700


[1.] clock warps +/- 4294 seconds on 2.2.7 w/SMP x86
[2.] On a dual-CPU PIII-450 system, the time returned by
gettimeofday() warps forward 4294 seconds every 1-2 seconds, then
jumps back again.
[3.] gettimeofday, clock warp, multiprocessor pentium TSC mismatch
[4.] Linux version 2.2.7 (root@vheissu.webvision.com) (gcc version
2.7.2.3) #20 SMP Wed May 12 20:47:49 PDT 1999
[5.] (No OOPS)
[6.] (No shell program to trigger)
[7.] Intel N440BX server motherboard, 2xPIII-450, 256MB memory
[7.1.] Software (add the output of the ver_linux script here)
[7.2.] dhm@vheissu(1)$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 7
model name : 00/07
stepping : 2
cpu MHz : 448.881356
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx osfxsr kni
bogomips : 447.28

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 7
model name : 00/07
stepping : 2
cpu MHz : 448.881356
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx osfxsr kni
bogomips : 447.28

[7.3.] Module information (from /proc/modules):
[7.4.] SCSI information (from /proc/scsi/scsi)
[7.5.]

I recently started having trouble with X on a dual-proc PIII-450: the
server would execute gettimeofday() periodically, but the time
returned would warp forward 4294 seconds every 1-2 seconds, and then
warp back almost immediately. The forward warp would kick off the
server's screen saver, which was a major annoyance; but I was worried
that file timestamps would be screwed up as well.

After a little digging, I found that the Pentium TSC register is being
used to give accurate microseconds in struct timeval, and guessed that
if the two CPUs' TSCs were off slightly, then you'd occasionally get
huge values for the count differential. (I should have realized right
away that 4294 is what you get when you divide 2^32 by 1e6, but
anyway...)

I added this sanity check to .../arch/i386/kernel/time.h:

/* .. relative to previous jiffy (32 bits is enough) */
eax -= last_tsc_low; /* tsc_low delta */

+ if (eax > 5000000)
+ return delay_at_last_interrupt;

and it seems to have cured my problem, but it's clearly just a hack.
I don't know how to either keep the TSCs in sync or to ensure that the
gettimeofday code is always executed by the same CPU; if it's not
possible, then perhaps this is the best way to handle it.

regards,
d.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/