Re: [BUG] Macbook pro timer bug (was: "Patch "CPU hotplug: call check_tsc_sync_source()with irqs off" breaks some drivers")

From: Nicolas Boichat
Date: Mon Mar 26 2007 - 05:55:44 EST


Ingo Molnar wrote:
> * Nicolas Boichat <nicolas@xxxxxxxxxx> wrote:
>
>
>>> I found out which commit seems to cause these bugs:
>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d04f41e35343f1d788551fd3f753f51794f4afcf
>>>
>>> The latest GIT without this commit works fine, but doesn't with it.
>>>
>> Sorry about blaming this commit. The problems happen randomly (about 1
>> reboot over 2 is ok, at least with 2.6.21-rc4). I'll run more tests
>> and post the results later.
>>
>
> is this the 32-bit kernel?

Yes it is.

> If yes then does commit
> 4edc5db83f574dfcc8be35b7b96760ded543b360 (included in -rc5) fix it?
>

I don't know precisely which commit you are talking about, but I
upgraded to 2.6.21-rc5, and the problem became more obvious.

>From what I experimented, I came to the conclusion that when the
frequency of the first CPU is wrongly detected, the timers get wrong
(i.e. tsc, hpet, with or without high-res timers).

It is not a new bug, it happened in 2.6.20 too (I also found an old
dmesg from 2.6.18-rc2 with the same frequency detection problem), except
that for some reasons it didn't cause that many obvious bugs, so I
didn't notice it at that time.

(in case you don't remember my first message, I have a Macbook Pro first
generation (Core Duo, so x86), see below for /proc/cpuinfo)

This is the diff of two consecutive soft reboots, with the same kernel
(CONFIG_HPET_TIMER unset, CONFIG_HIGH_RES_TIMERS set):

--- dmesg-2.6.21-rc5-nohpet-error-notime 2007-03-26 16:42:00.000000000 +0800
+++ dmesg-2.6.21-rc5-nohpet-success-notime 2007-03-26 16:42:09.000000000 +0800
@@ -100,7 +100,7 @@
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 16384 bytes)
-Detected 1952.283 MHz processor.
+Detected 1830.920 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)


The "+" boot reports the correct CPU frequency (1.83Ghz).

When you run something like this on both:
# ntpdate swisstime.ethz.ch
# ntpdate swisstime.ethz.ch
# sleep 60
# ntpdate swisstime.ethz.ch

You get, with the "+" boot, what you expect:

26 Mar 16:33:34 ntpdate[3385]: adjust time server 129.132.2.21 offset 0.120391 sec
26 Mar 16:32:52 ntpdate[3376]: adjust time server 129.132.2.21 offset 0.137979 sec
-- pause of about 60 seconds
26 Mar 16:33:55 ntpdate[3386]: adjust time server 129.132.2.21 offset 0.125751 sec


i.e. the clock is still correct 60 seconds after you adjust it.

While, with the "-" boot, you get:

26 Mar 16:18:05 ntpdate[3444]: step time server 129.132.2.21 offset 3.829351 sec
26 Mar 16:18:08 ntpdate[3445]: adjust time server 129.132.2.21 offset 0.232023 sec
-- pause
26 Mar 16:19:16 ntpdate[3447]: step time server 129.132.2.21 offset 4.320653 sec

The clock has shifted of about 4.1 seconds in 60 seconds, i.e. an error
of 6.8%, which is in the same range as the error between 1952 and 1833
Mhz (6.5%).

I get the same results with both CONFIG_HPET_TIMER and
CONFIG_HIGH_RES_TIMERS unset, some boots are ok, some aren't.

I couldn't get a correct boot with these two options enabled, but it's
probably because I didn't reboot enough times...

I you want some detailed kernel logs, see
http://www.boichat.ch/nicolas/linux/2nd/ .
Please tell me if you need more informations, or if you prefer I file a
bug in the bugzilla.

Best regards,

Nicolas

---

/proc/cpuinfo on an incorrect boot:

nunuche ~ # cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 14
model name : Genuine Intel(R) CPU T2400 @ 1.83GHz
stepping : 8
cpu MHz : 1952.216
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc pni monitor vmx est tm2 xtpr
bogomips : 4689.63
clflush size : 64

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 14
model name : Genuine Intel(R) CPU T2400 @ 1.83GHz
stepping : 8
cpu MHz : 1952.216
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc pni monitor vmx est tm2 xtpr
bogomips : 3661.27
clflush size : 64


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/