Re: [GIT pull] x86/timers for 4.10

From: Thomas Gleixner
Date: Wed Feb 08 2017 - 06:45:31 EST


On Mon, 6 Feb 2017, Olof Johansson wrote:
> [ 0.177102] [Firmware Bug]: TSC ADJUST differs: Reference CPU0:
> -6495898515190607 CPU1: -6495898517158354

Yay, another "clever" BIOS ....

> [ 0.177104] TSC ADJUST synchronize: Reference CPU0: 0 CPU1: -6495898517158354
> [2325258.877496] #2
> [ 0.257232] [Firmware Bug]: TSC ADJUST differs: Reference CPU0:
> -6495898515190607 CPU2: -6495898516849701
> [ 0.257234] TSC ADJUST synchronize: Reference CPU0: 0 CPU2: -6495898516849701
> [2325258.957514] #3
>
> (Once SMP bringup is done, system settles down at the 232525x printk timestamps)


> 1) Timestamp jumps around during SMP bringup.
>
> 2) Timestamp jumps forward a lot. That timestamp is ~26 days, which is
> likely the last cold boot of the system, similar to the original
> reports.
>
> I do find it somewhat annoying when printk timestamps aren't 0-based
> at boot, but can cope with it. Not sure if it's intended behavior
> though?

More resulting behaviour than intended.

The first goal was to fixup that TSC_ADJUST mess because it causes
malfunction of the TSC deadline timer and we don't know which other side
effects the broken hardware implementation has.

Aside of preventing the wreckage TSC becomes usable on those machines which
makes a massive performance difference especially on such big irons where
otherwise all CPUs (in your case 56) compete for HPET access....

> And the jumping around definitely seems to not be.

Bah. You got a working TSC and instead of enjoying the performance boost
you complain about timestamps in dmesg :)

But yes, it's not intended and I really did not pay attention to that.

Looking deeper, it's trivial to fixup the forward jump, but it would
require a lot of nasty hackery to make the "Firmware Bug" timestamps
behave.

The result with the patch below is:

[ 0.397445] smp: Bringing up secondary CPUs ...
[ 0.402100] x86: Booting SMP configuration:
[ 0.406343] .... node #0, CPUs: #1
[1265776479.930667] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU1: -2978888639183101
[1265776479.944664] TSC ADJUST synchronize: Reference CPU0: 0 CPU1: -2978888639183101
[ 0.508119] #2
[1265776480.032346] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU2: -2978888639183677
[1265776480.044192] TSC ADJUST synchronize: Reference CPU0: 0 CPU2: -2978888639183677
[ 0.607643] #3
[1265776480.131874] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU3: -2978888639184530
[1265776480.143720] TSC ADJUST synchronize: Reference CPU0: 0 CPU3: -2978888639184530
[ 0.707108] smp: Brought up 1 node, 4 CPUs
[ 0.711271] smpboot: Total of 4 processors activated (21698.88 BogoMIPS)

and frankly I think it's not worth the trouble of buffering and delayed
printing to make the timestamps for these [Firmware Bug] lines
adjusted.

It's a firmware bug, so the timestamp being bonkers is just collateral
damage. Tell your hardware vendor to fix that crap.

> If someone cares, hardware is a Dell T7810 with 2x E5-2663 v3, BIOS
> date 03/09/2016.

Yet another Dell...

Thanks,

tglx

8<----------------------------

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index e41af597aed8..37e7cf544e51 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1356,6 +1356,9 @@ void __init tsc_init(void)
(unsigned long)cpu_khz / 1000,
(unsigned long)cpu_khz % 1000);

+ /* Sanitize TSC ADJUST before cyc2ns gets initialized */
+ tsc_store_and_check_tsc_adjust(true);
+
/*
* Secondary CPUs do not run through tsc_init(), so set up
* all the scale factors for all CPUs, assuming the same
@@ -1386,8 +1389,6 @@ void __init tsc_init(void)

if (unsynchronized_tsc())
mark_tsc_unstable("TSCs unsynchronized");
- else
- tsc_store_and_check_tsc_adjust(true);

check_system_tsc_reliable();