[PATCH 2/2] x86/tsc_sync: Add synchronization overhead to tsc adjustment

From: Waiman Long
Date: Mon Mar 14 2022 - 15:47:23 EST


As stated in the comment of check_tsc_sync_target():

The adjustment value is slightly off by the overhead of the
sync mechanism (observed values are ~200 TSC cycles), but this
really depends on CPU, node distance and frequency. So
compensating for this is hard to get right.

That overhead, however, can cause the tsc adjustment to fail after 3
test runs as can be seen when booting up a hot 4-socket Intel CooperLake
system:

[ 0.034090] TSC deadline timer available
[ 0.008807] TSC ADJUST compensate: CPU36 observed 95626 warp. Adjust: 95626
[ 0.008807] TSC ADJUST compensate: CPU36 observed 74 warp. Adjust: 95700
[ 0.974281] TSC synchronization [CPU#0 -> CPU#36]:
[ 0.974281] Measured 4 cycles TSC warp between CPUs, turning off TSC clock.
[ 0.974281] tsc: Marking TSC unstable due to check_tsc_sync_source failed

To prevent this tsc adjustment failure, we need to estimate the sync
overhead which will be at least an unlock operation in one cpu followed
by a lock operation in another cpu.

The measurement is done in check_tsc_sync_target() after stop_count
reached 2 which is set by the source cpu after it re-initializes the sync
variables causing the lock cacheline to be remote from the target cpu.
The subsequent time measurement will then be similar to latency between
successive 2-cpu sync loop in check_tsc_warp().

Interrupt should not yet been enabled when check_tsc_sync_target() is
called. However some interference may have caused the overhead estimation
to vary a bit. With this patch applied, the measured overhead on the
same CooperLake system on different reboot runs varies from 104 to 326.

Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
arch/x86/kernel/tsc_sync.c | 27 ++++++++++++++++++---------
1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 70aeb254b62b..e2c43ba4e7b8 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -445,6 +445,7 @@ void check_tsc_sync_target(void)
struct tsc_adjust *cur = this_cpu_ptr(&tsc_adjust);
unsigned int cpu = smp_processor_id();
cycles_t cur_max_warp, gbl_max_warp;
+ cycles_t start, sync_overhead;
int cpus = 2;

/* Also aborts if there is no TSC. */
@@ -505,29 +506,37 @@ void check_tsc_sync_target(void)
if (!atomic_read(&test_runs))
return;

+ /*
+ * Estimate the synchronization overhead by measuring the time for
+ * a lock/unlock operation.
+ */
+ start = rdtsc_ordered();
+ arch_spin_lock(&sync.lock);
+ arch_spin_unlock(&sync.lock);
+ sync_overhead = rdtsc_ordered() - start;
+
/*
* If the warp value of this CPU is 0, then the other CPU
* observed time going backwards so this TSC was ahead and
* needs to move backwards.
*/
- if (!cur_max_warp)
+ if (!cur_max_warp) {
cur_max_warp = -gbl_max_warp;
+ sync_overhead = -sync_overhead;
+ }

/*
* Add the result to the previous adjustment value.
*
* The adjustment value is slightly off by the overhead of the
* sync mechanism (observed values are ~200 TSC cycles), but this
- * really depends on CPU, node distance and frequency. So
- * compensating for this is hard to get right. Experiments show
- * that the warp is not longer detectable when the observed warp
- * value is used. In the worst case the adjustment needs to go
- * through a 3rd run for fine tuning.
+ * really depends on CPU, node distance and frequency. Add the
+ * estimated sync overhead to the adjustment value.
*/
- cur->adjusted += cur_max_warp;
+ cur->adjusted += cur_max_warp + sync_overhead;

- pr_warn("TSC ADJUST compensate: CPU%u observed %lld warp. Adjust: %lld\n",
- cpu, cur_max_warp, cur->adjusted);
+ pr_warn("TSC ADJUST compensate: CPU%u observed %lld warp (overhead %lld). Adjust: %lld\n",
+ cpu, cur_max_warp, sync_overhead, cur->adjusted);

wrmsrl(MSR_IA32_TSC_ADJUST, cur->adjusted);
goto retry;
--
2.27.0