'perf test tsc' failing, bisected to "sched/clock: Provide better clock continuity"

From: Arnaldo Carvalho de Melo
Date: Thu Mar 16 2017 - 10:01:47 EST


Hi, this entry is failing for a while:

[root@jouet ~]# perf test -v tsc
55: Convert perf time to TSC :
--- start ---
test child forked, pid 3008
mmap size 528384B
1st event perf time 93133455486631 tsc 15369449468752
rdtsc time 93133464598760 tsc 15369473104358
2nd event perf time 93133455506961 tsc 15369449521485
test child finished with -1
---- end ----
Convert perf time to TSC: FAILED!
[root@jouet ~]#

I bisected it to the following kernel change, ideas?

[acme@felicio linux]$ git bisect good
5680d8094ffa9e5cfc81afdd865027ee6417c263 is the first bad commit
commit 5680d8094ffa9e5cfc81afdd865027ee6417c263
Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Date: Thu Dec 15 13:36:17 2016 +0100

sched/clock: Provide better clock continuity

When switching between the unstable and stable variants it is
currently possible that clock discontinuities occur.

And while these will mostly be 'small', attempt to do better.

As observed on my IVB-EP, the sched_clock() is ~1.5s ahead of the
ktime_get_ns() based timeline at the point of switchover
(sched_clock_init_late()) after SMP bringup.

Equally, when the TSC is later found to be unstable -- typically
because SMM tries to hide its SMI latencies by mucking with the TSC --
we want to avoid large jumps.

Since the clocksource watchdog reports the issue after the fact we
cannot exactly fix up time, but since SMI latencies are typically
small (~10ns range), the discontinuity is mainly due to drift between
sched_clock() and ktime_get_ns() (which on my desktop is ~79s over
24days).

I dislike this patch because it adds overhead to the good case in
favour of dealing with badness. But given the widespread failure of
TSC stability this is worth it.

Note that in case the TSC makes drastic jumps after SMP bringup we're
still hosed. There's just not much we can do in that case without
stupid overhead.

If we were to somehow expose tsc_clocksource_reliable (which is hard
because this code is also used on ia64 and parisc) we could avoid some
of the newly introduced overhead.

Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Mike Galbraith <efault@xxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>

:040000 040000 152545abe3b879aaa3cf053cdd58ef998c285529 3afcd0a5bc643fdd0fc994ee11cbfd87cfe4c30f M kernel
[acme@felicio linux]$