Re: [PATCH clocksource 1/3] clocksource: Reject bogus watchdog clocksource measurements

From: Feng Tang
Date: Thu Dec 01 2022 - 20:13:26 EST


On Thu, Dec 01, 2022 at 09:24:05AM -0800, Paul E. McKenney wrote:
> On Wed, Nov 30, 2022 at 02:00:04PM +0800, Feng Tang wrote:
> > On Tue, Nov 29, 2022 at 09:50:24PM -0800, Paul E. McKenney wrote:
> > [...]
> > > >
> > > > Great! As both HPET and PM_TIMER get the same calibration 1975.000 MHz,
> > > > and it matches the 40ms drift per second you mentioned earlier, this
> > > > seems like the CPUID(0x15) gave the wrong frequence number.
> > > >
> > > > Or unlikely, HPET and PM_TIMER are driven by the same circuit source,
> > > > which has deviation.
> > > >
> > > > Either way, I think the HW/FW could have some problem.
> > >
> > > And this time with your printk()s:
> > >
> > > [    0.000000] tsc: using CPUID[0x15] crystal_khz= 24000 kHz ebx=158 eax=2
> > > [    0.000000] tsc: Detected 1900.000 MHz processor
> > > [    0.000000] tsc: Detected 1896.000 MHz TSC
> > > [    5.268858] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
> > > [   25.706231] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x36a8d32ce31, max_idle_ns: 881590731004 ns
> > > [   32.223011] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
> > > [   57.823933] clocksource: Switched to clocksource tsc-early
> > > [   58.144840] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
> > > [   63.613713] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x36a8d32ce31, max_idle_ns: 881590731004 ns
> > > [   63.637323] clocksource: Switched to clocksource tsc
> > > [   64.673579] tsc: Warning: TSC freq calibrated by CPUID/MSR differs from what is calibrated by HW timer, please check with vendor!!
> > > [   64.703719] tsc: Previous calibrated TSC freq:        1896.000 MHz
> > > [   64.716816] tsc: TSC freq recalibrated by [PM_TIMER]:         1974.999 MHz
> >
> > This confirms the tsc frequency is calculated from CPUID(0x15).
> >
> > > What would be good next steps to check up on the hardware and firmware?
> >
> > Maybe raise it to vendor? I have no idea how to check thos black boxes :)
>
> Done, but no high hopes here. (What, me cynical? Better believe it!!!)
>
> > > (My next step involves a pillow, but will follow up tomorrow morning
> > > Pacific Time.)
> >
> > Really thanks for checking this through late night!
>
> No problem, and I guess it is instead the day after tomorrow, but
> I thought you might be interested in chronyd's opinion:
>
> [root@rtptest1029.snc8 ~]# cat /var/lib/chrony/drift
>         40001.074911             0.002098
>
> In contrast, on my Fedora laptop:
>
> $ sudo cat /var/lib/chrony/drift
>             2.074313             0.186606
>
> I am (perhaps incorrectly) taking this to indicate that TSC is in fact
> drifting with respect to standard time.

This info is very useful! It further confirms the CPUID(0x15) gave
the wrong frequency info.

Also I don't think TSC itself is drifting, and the drift some from
the wrong match calculation(1896 MHz), if we give it the correct
number (likely 1975 MHz here), there shouldn't be big chrony drift
like your Fedora laptop.

Thanks,
Feng


> Thoughts?
>
> Thanx, Paul