Re: [PATCH RESEND] x86/tsc: print some log if calibrated tsc freq deviates from original too much

From: Lei Chen
Date: Wed Apr 10 2024 - 00:53:15 EST


On Tue, Apr 9, 2024 at 4:05 PM Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
>
> * Lei Chen <lei.chen@xxxxxxxxxx> wrote:
>
> > In most cases, tsc_khz is refined by hpet on boot. But in a few
> > production-level nodes, the refinement fails because calibrated
> > freq diviates from origin tsc freq more than 1%. Printing some
> > logs will help get this info.
> >
> > Signed-off-by: Lei Chen <lei.chen@xxxxxxxxxx>
> > ---
> > arch/x86/kernel/tsc.c | 9 ++++++++-
> > 1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index 15f97c0abc9d..a68b16e72df1 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -1435,8 +1435,15 @@ static void tsc_refine_calibration_work(struct work_struct *work)
> > }
> >
> > /* Make sure we're within 1% */
> > - if (abs(tsc_khz - freq) > tsc_khz/100)
> > + if (abs(tsc_khz - freq) > tsc_khz/100) {
> > + pr_warn("Warning: TSC freq calibrated by [%s]: %lu.%03lu MHz deviates too much from original freq: %lu.%03lu MHz\n",
>
> Yeah, so it wouldn't cost us anything to more precisely define 'too much':
>
> s/deviates too much from
> /deviates by more than 1% from
>
> Right?

Thanks for your suggestion, I'll change it.


>
> > + hpet ? "HPET" : "PM_TIMER",
> > + (unsigned long)freq / 1000,
> > + (unsigned long)freq % 1000,
> > + (unsigned long)tsc_khz / 1000,
> > + (unsigned long)tsc_khz % 1000);
> > goto out;
> > + }
>
> The warning makes sense I suppose, if it's one per system and once per
> bootup [right?], but I think pr_info() would be plenty enough priority for
> this condition - especially as we didn't have the warning before and don't
> know how frequent it is?
>
I'm afraid tsc frequency deviates from HPET more than 1% is a system risk.
It means either HPET or tsc might be not stable. Users should be cautious of
the risk and turn to ntp for help. In our production nodes, it indeed
causes more
time skew. So maybe warning is a better way to expose the risk ?