Re: [PATCH RFC] x86/tsc: Make recalibration default on for TSC_KNOWN_FREQ cases

From: Feng Tang
Date: Mon May 22 2023 - 11:27:51 EST


On Mon, May 22, 2023 at 04:31:28PM +0200, Thomas Gleixner wrote:
> On Mon, May 22 2023 at 21:00, Feng Tang wrote:
> > On Mon, May 22, 2023 at 01:49:53PM +0200, Thomas Gleixner wrote:
> >> > Paul and Rui can provide more info. AFAIK, those problems were raised
> >> > by external customers, so the platform were already shipped from
> >> > Intel. But I'm not sure they are commercial versions or early
> >> > engineering drops.
> >>
> >> So its at a company which knows how to update firmware, right?
> >
> > Yes. And the recalibration may help to exposed the bug quickly.
>
> That should be exposed _before_ crappy firmware is shipped and
> validation can use the command line parameter. I'm tired of this
> constant source of embarrassing stupidity. It's not rocket science to
> catch this before shipping.
>
> And guess what. Making this easy to recover from is just not making the
> situation any better because firmware people will even care less.

I can't argue with that :)

> >> and five lines further down:
> >>
> >> /*
> >> * For Atom SoCs TSC is the only reliable clocksource.
> >> * Mark TSC reliable so no watchdog on it.
> >> */
> >> if (boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT)
> >> setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
> >>
> >> So its reliable and needs recalibration against hardware which does not
> >> exist.
> >
> > I misunderstood it. When you said 'SOCs which lack legacy hardware',
> > I thought you were referring those old Merrifield/Medfield things,
> > which may have no HPET/ACPI PM_Timer but an APB timer, and mainly go
> > through MSR way (tsc_msr.c) for TSC frequency.
> >
> > In this native_calibrate_tsc(), which touches the INTEL_FAM6_ATOM_GOLDMONT
> > and INTEL_FAM6_ATOM_GOLDMONT_D, I dug out one Apollo Lake and one
> > Denverton platform (which comply to those GOLDMNOT model), and they
> > both have 'hpet' and 'acpi_pm' clocksource registered.
>
> So that comment is wrong and that commit log is from fantasy land?
>
> http://lkml.kernel.org/r/1479241644-234277-4-git-send-email-bin.gao@xxxxxxxxxxxxxxx
>
> Clearly the left hand is not knowing what the right hand is doing.

I started working on Atom (Moorestown) in about 2008, and moved to
other platforms before the time of the patch.

And I don't understand the commit log: "On Intel GOLDMONT Atom SoC
TSC is the only reliable clocksource. We mark TSC reliable to avoid
watchdog on it."

Clearly the Denventon I found today has both HPET and ACPI_PM timer:

[root@dnv0 ~]# grep . /sys/devices/system/clocksource/clocksource0/*
/sys/devices/system/clocksource/clocksource0/available_clocksource:tsc hpet acpi_pm
/sys/devices/system/clocksource/clocksource0/current_clocksource:tsc

The lscpu info is:

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: GenuineIntel
BIOS Vendor ID: Intel(R) Corporation
Model name: Intel(R) Atom(TM) CPU C3850 @ 2.10GHz
BIOS Model name: Intel(R) Atom(TM) CPU C3850 @ 2.10GHz CPU @ 2.1GHz
BIOS CPU family: 43
CPU family: 6
Model: 95
Thread(s) per core: 1
Core(s) per socket: 12
Socket(s): 1
Stepping: 1

Maybe this cpu model (0x5F) has been used by some type of platforms
which has met the false alarm watchdog issue.

Thanks,
Feng

> Thanks,
>
> tglx