[PATCH 0/9] sched_clock fixes

From: Peter Zijlstra
Date: Fri Apr 21 2017 - 13:38:54 EST


Hi,

These patches were inspired (and hopefully fix) two independent bug reports on
Core2 machines.

I never could quite reproduce one, but my Core2 machine no longer switches to
stable sched_clock and therefore no longer tickles the problematic stable ->
unstable transition either.

Before I dug up my Core2 machine, I tried emulating TSC wreckage by poking
random values into the TSC MSR from userspace. Behaviour in that case is
improved as well.

People have to realize that if we manage to boot with TSC 'stable' (both
sched_clock and clocksource) and we later find out we were mistaken (we observe
a TSC wobble) the clocks that are derived from it _will_ have had an observable
hickup. This is fundamentally unfixable.

If you own a machine where the BIOS tries to hide SMI latencies by rewinding
TSC (yes, this is a thing), the very best we can do is mark TSC unstable with a
boot parameter.

For example, this is me writing a stupid value into the TSC:

[ 46.745082] random: crng init done
[18443029775.010069] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
[18443029775.023141] clocksource: 'hpet' wd_now: 3ebec538 wd_last: 3e486ec9 mask: ffffffff
[18443029775.034214] clocksource: 'tsc' cs_now: 5025acce9 cs_last: 24dc3bd21c88ee mask: ffffffffffffffff
[18443029775.046651] tsc: Marking TSC unstable due to clocksource watchdog
[18443029775.054211] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[18443029775.064434] sched_clock: Marking unstable (70569005835, -17833788)<-(-3714295689546517, -2965802361)
[ 70.573700] clocksource: Switched to clocksource hpet

With some trace_printk()s (not included) I could tell that the wobble
occured at 69.965474. The clock now resumes where it 'should' have been.

But an unfortunate scheduling event could have resulted in one task
having seen a runtime of ~584 years with 'obvious' effects. Similar
jumps can also be observed from userspace GTOD usage.