Re: [PATCH] x86/tsc: RFC: re-synchronize TSCs to boot cpu TSC

From: Thomas Gleixner
Date: Fri Dec 09 2016 - 12:36:18 EST


On Fri, 9 Dec 2016, Roland Scheidegger wrote:
> Am 09.12.2016 um 10:59 schrieb Thomas Gleixner:
> > On Fri, 9 Dec 2016, Roland Scheidegger wrote:
> >>
> >> I saw some system lockups though:
> >> When doing a cold boot, this kernel never managed to boot up. The last
> >> message seen is:
> >> x86: Booting SMP configuration:
> >> .... node #0, CPUs: #1
> >
> > Weird. That really would be interesting to figure out what goes wrong
> > there. What bothers me is that we don't see something like this:
> >
> >> [ 0.172334] TSC ADJUST differs: Reference CPU0: -577421768610 CPU1:
> >> -577423766270
> >
> > Can you please apply the debug patch below and provide the output ?
> Ok, this is the output (minus some typos maybe...):
>
> x86: Booting SMP configuration:
> .... node #0, CPUs: #1
> TSC ADJUST: CPU1: -2806491604
> TSC source sync 0 -> 1 runs 3
> TSC ADJUST differs: Reference CPU0: -2805503200 CPU1: -2806491604
> TSC ADJUST synchronize: Reference CPU0: -2805503200 CPU1: -2806491604
> TSC target sync skip
> TSC source sync skipped
>
> And that's it.

That's a cold boot (power on), right?

That looks like I expected. As we know that the CPUs are in the same
package simply resynchronizing the TSC ADJUST MSR is enough. And the sync
test is skipped as we have resynced TSC ADJUST already to be the same
value.

What's unexpected is that the thing stops working :(

I'm pretty sure the boot cpu hangs in that endless loop waiting for CPU1 to
set the online bit, but for whatever reason this does not happen.

Can you add the patch below to gather more information? There is a hunk in
there with an '#if 0' which sets the TSC ADJUST to 0 on boot, which you can
turn on as second step.

This BIOS seems to be rather strange ...

Thanks.

tglx
----
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -95,7 +95,7 @@ bool tsc_store_and_check_tsc_adjust(void
cur->nextcheck = jiffies + HZ;
cur->warned = false;

- pr_info("TSC ADJUST: CPU%u: %lld\n", cpu, bootval);
+ pr_info("TSC ADJUST: CPU%u: %lld %lld\n", cpu, bootval, rdtsc());

/*
* Check whether this CPU is the first in a package to come up. In
@@ -114,8 +114,15 @@ bool tsc_store_and_check_tsc_adjust(void
* the sync mechanism. If that fails we still can yell
* about boot values not being consistent.
*/
- cur->adjusted = bootval;
pr_info_once("TSC ADJUST: Boot CPU%u: %lld\n", cpu, bootval);
+#if 0
+ if (bootval) {
+ pr_warn("TSC ADJUST: CPU%u: Setting to 0\n", cpu);
+ bootval = 0;
+ wrmsrl(MSR_IA32_TSC_ADJUST, bootval);
+ }
+#endif
+ cur->adjusted = bootval;
return false;
}

--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -245,7 +245,9 @@ static void notrace start_secondary(void
* from seeing a half valid vector space.
*/
lock_vector_lock();
+ pr_info("Vector locked\n");
setup_vector_irq(smp_processor_id());
+ pr_info("Vector setup done\n");
set_cpu_online(smp_processor_id(), true);
unlock_vector_lock();
cpu_set_state_online(smp_processor_id());
@@ -257,8 +259,10 @@ static void notrace start_secondary(void
/* to prevent fake stack check failure in clock setup */
boot_init_stack_canary();

+ pr_info("Clock setup\n");
x86_cpuinit.setup_percpu_clockev();

+ pr_info("Enter idle startup\n");
wmb();
cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
}
@@ -1133,7 +1137,7 @@ int native_cpu_up(unsigned int cpu, stru
cpu_relax();
touch_nmi_watchdog();
}
-
+ pr_info("Target CPU is online\n");
return 0;
}