Re: [PATCH] x86/tsc: Add option to force HW timer based recalibration

From: Thomas Gleixner
Date: Mon May 09 2022 - 09:03:32 EST


Feng,

On Mon, May 09 2022 at 19:22, Feng Tang wrote:
> On Mon, May 09, 2022 at 12:01:42PM +0200, Thomas Gleixner wrote:
>> > This option is more like a way to double-check the correctness of
>> > tsc-freq got from MSR/CPUID(0x15).
>>
>> If at all it's a workaround for the inability and ignorance of firmware
>> people. The crystal frequency and the TSC/crystal ratio _are_ known to
>> the system designer and firmware people. It's really not asked too much
>> to put the correct values into CPUID(0x15) and have proper quality
>> control to ensure the correctness.
>>
>> The whole argument about early firmware and pre-production hardware is
>> bogus. It's 2022 and we are debating this problem for more than a decade
>> now and still hardware and firmware people think they can do what they
>> want and it all can be "fixed" in software. It's not rocket science to
>> get this straight.
>
> I completely understand it, as we've also suffered a lot from such
> problems. This patch doesn't change any current work flow, and it simply
> calibrates and prints out the freq info (warn if there is big deviation).
> It acctually provides SW guys a quick way to argue with HW/FW people:
> "See! You've given us a wrong number, please fix it", otherwise I heard
> there was customer long ago who used atomic clock to prove the
> deviation.

Then please say clearly in the changelog what this is about.

"Currently when HW provides the tsc freq info through MSR or
CPUID(0x15), the info will be taken as the 'best guess', and kernel
will set the X86_FEATURE_TSC_KNOWN_FREQ flag and skip the HW timer
based recalibration, which works pretty well.

And there is still very few corner case that the freq info is not
accurate enough will small deviation from the actual value, like on
a product with early version (fix needed) of firmware or some
pre-production hardware.

Add..."

versus:

"The kernel assumes that the TSC frequency which is provided by the
hardware / firmware via MSRs or CPUID(0x15) is correct after applying
a few basic consistency checks. This disables the TSC recalibration
against HPET or PM timer.

As a result there is no mechanism to validate that frequency in cases
where a firmware or hardware defect is suspected.

Add..."

Can you spot the difference in intention? The first one sounds like:

We need to tolerate the hardware/firmware induced crap.

The second one says:

Provide a mechanism to validate because we cannot trust hardware /
firmware.

Hmm?

>> Aside of that HPET has become unrealiable and PM timer is not guaranteed
>> to be there either. So we really do not need a mechanism to enforce
>> recalibration against something which is not guaranteed to provide
>> sensible information.
>
> Correct. The HPET on new client platforms turn to be disabled for the
> PC10 issue, though it's fine on server platforms where tsc accuracy is
> more important.

TSC accuracy is important in any case. Why would it be more important on
server platforms? Just because?

> Also even for the disabled HPET case, I remembered that you've once
> suggested to leverage its capability for calibration, and only disable
> it before cpu idle framework really starts :)

Correct, but that would only be valid for early boot calibration and not
accross the recalibration.

That's why we ended up disabling HPET early in case of PC10. See
hpet_is_pc10_damaged().

Thanks,

tglx