Re: Clock jumps

From: john stultz
Date: Thu May 27 2010 - 20:00:35 EST


On Thu, 2010-05-27 at 23:48 +0200, Bernhard Schmidt wrote:
> On 27.05.2010 21:08, john stultz wrote:
> > I'd be very interested in hearing more about the host side issue. So
> > this happened with the same kernel that you were using before, with no
> > trouble?
>
> Correct.
>
> > Could you also send dmesg output from this boot? And if you can find
> > any older dmesg logs to compare with, send those too?
>
> See http://users.birkenwald.de/~berni/temp/dmesg-lenny and
> http://users.birkenwald.de/~berni/temp/dmesg-squeeze . Although running
> on the same kernel binary the initrd changed greatly when upgrading, so
> ordering/timing between those two is off.
>
> Note that the dmesg output is captured right after boot. I think I
> remember seeing a "TSC unstable" message pretty soon after boot, but I
> might be mixing it up with my other AMD-based KVM server. I don't hold
> normal (non-boot) logs that long, so I can't tell for sure.

Looking at the diff:
--- dmesg-lenny 2010-05-27 16:45:33.000000000 -0700
+++ dmesg-squeeze 2010-05-27 16:46:14.000000000 -0700
@@ -132,8 +132,8 @@
console [ttyS1] enabled
hpet clockevent registered
Fast TSC calibration using PIT
-Detected 2660.398 MHz processor.
-Calibrating delay loop (skipped), value calculated using timer frequency.. 5320.79 BogoMIPS (lpj=10641592)
+Detected 2613.324 MHz processor.
+Calibrating delay loop (skipped), value calculated using timer frequency.. 5226.64 BogoMIPS (lpj=10453296)
Security Framework initialized
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
@@ -160,7 +160,7 @@
CPU0: Intel(R) Xeon(R) CPU 3075 @ 2.66GHz stepping 0b
Booting Node 0, Processors #1
Brought up 2 CPUs
-Total of 2 processors activated (10640.79 BogoMIPS).
+Total of 2 processors activated (10546.63 BogoMIPS).
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xe0000000-0xefffffff] (base 0xe0000000)

So you can see in the above the during the second boot the TSC
calibration was badly mis-calculated. This was the cause of the skew.

Not sure how that might be linked to the distro upgrade. It could have
been something like SMI damage during the calibration time, but I
thought the calibration loop watched for that.

Bernhard: I expect with all those vms, this machine isn't rebooted
frequently. So could you look at the logs to see how much the "Detected
xxxx.yyy MHz processor." line varies by across a few other boots (if
they still exist?).

Ingo/Thomas: Any thoughts, should we be considering dropping the
quick_pit_calibrate() code and always do the slower route?

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/