[tip:timers/vdso] lib/vdso: Make delta calculation work correctly

From: tip-bot for Thomas Gleixner
Date: Wed Jun 26 2019 - 08:37:47 EST


Commit-ID: 9d90b93bf325e015bbae31b83f16da5e4e17effa
Gitweb: https://git.kernel.org/tip/9d90b93bf325e015bbae31b83f16da5e4e17effa
Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
AuthorDate: Wed, 26 Jun 2019 12:02:00 +0200
Committer: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
CommitDate: Wed, 26 Jun 2019 14:26:53 +0200

lib/vdso: Make delta calculation work correctly

The x86 vdso implementation on which the generic vdso library is based on
has subtle (unfortunately undocumented) twists:

1) The code assumes that the clocksource mask is U64_MAX which means that
no bits are masked. Which is true for any valid x86 VDSO clocksource.
Stupidly it still did the mask operation for no reason and at the wrong
place right after reading the clocksource.

2) It contains a sanity check to catch the case where slightly
unsynchronized TSC values can be observed which would cause the delta
calculation to make a huge jump. It therefore checks whether the
current TSC value is larger than the value on which the current
conversion is based on. If it's not larger the base value is used to
prevent time jumps.

#1 Is not only stupid for the X86 case because it does the masking for no
reason it is also completely wrong for clocksources with a smaller mask
which can legitimately wrap around during a conversion period. The core
timekeeping code does it correct by applying the mask after the delta
calculation:

(now - base) & mask

#2 is equally broken for clocksources which have smaller masks and can wrap
around during a conversion period because there the now > base check is
just wrong and causes stale time stamps and time going backwards issues.

Unbreak it by:

1) Removing the mask operation from the clocksource read which makes the
fallback detection work for all clocksources

2) Replacing the conditional delta calculation with a overrideable inline
function.

#2 could reuse clocksource_delta() from the timekeeping code but that
results in a significant performance hit for the x86 VSDO. The timekeeping
core code must have the non optimized version as it has to operate
correctly with clocksources which have smaller masks as well to handle the
case where TSC is discarded as timekeeper clocksource and replaced by HPET
or pmtimer. For the VDSO there is no replacement clocksource. If TSC is
unusable the syscall is enforced which does the right thing.

To accommodate to the needs of various architectures provide an
override-able inline function which defaults to the regular delta
calculation with masking:

(now - base) & mask

Override it for x86 with the non-masking and checking version.

This unbreaks the ARM64 syscall fallback operation, allows to use
clocksources with arbitrary width and preserves the performance
optimization for x86.

Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@xxxxxxx>
Cc: linux-arch@xxxxxxxxxxxxxxx
Cc: LAK <linux-arm-kernel@xxxxxxxxxxxxxxxxxxx>
Cc: linux-mips@xxxxxxxxxxxxxxx
Cc: linux-kselftest@xxxxxxxxxxxxxxx
Cc: catalin.marinas@xxxxxxx
Cc: Will Deacon <will.deacon@xxxxxxx>
Cc: Arnd Bergmann <arnd@xxxxxxxx>
Cc: linux@xxxxxxxxxxxxxxx
Cc: Ralf Baechle <ralf@xxxxxxxxxxxxxx>
Cc: paul.burton@xxxxxxxx
Cc: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
Cc: salyzyn@xxxxxxxxxxx
Cc: pcc@xxxxxxxxxx
Cc: shuah@xxxxxxxxxx
Cc: 0x7f454c46@xxxxxxxxx
Cc: linux@xxxxxxxxxxxxxxxxxx
Cc: huw@xxxxxxxxxxxxxxx
Cc: sthotton@xxxxxxxxxxx
Cc: andre.przywara@xxxxxxx
Cc: Andy Lutomirski <luto@xxxxxxxxxx>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1906261159230.32342@xxxxxxxxxxxxxxxxxxxxxxx

---
arch/x86/include/asm/vdso/gettimeofday.h | 27 +++++++++++++++++++++++++++
lib/vdso/gettimeofday.c | 19 +++++++++++++++----
2 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/vdso/gettimeofday.h b/arch/x86/include/asm/vdso/gettimeofday.h
index 5b63f1f78a1f..a14039a59abd 100644
--- a/arch/x86/include/asm/vdso/gettimeofday.h
+++ b/arch/x86/include/asm/vdso/gettimeofday.h
@@ -229,6 +229,33 @@ static __always_inline const struct vdso_data *__arch_get_vdso_data(void)
return __vdso_data;
}

+/*
+ * x86 specific delta calculation.
+ *
+ * The regular implementation assumes that clocksource reads are globally
+ * monotonic. The TSC can be slightly off across sockets which can cause
+ * the regular delta calculation (@cycles - @last) to return a huge time
+ * jump.
+ *
+ * Therefore it needs to be verified that @cycles are greater than
+ * @last. If not then use @last, which is the base time of the current
+ * conversion period.
+ *
+ * This variant also removes the masking of the subtraction because the
+ * clocksource mask of all VDSO capable clocksources on x86 is U64_MAX
+ * which would result in a pointless operation. The compiler cannot
+ * optimize it away as the mask comes from the vdso data and is not compile
+ * time constant.
+ */
+static __always_inline
+u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
+{
+ if (cycles > last)
+ return (cycles - last) * mult;
+ return 0;
+}
+#define vdso_calc_delta vdso_calc_delta
+
#endif /* !__ASSEMBLY__ */

#endif /* __ASM_VDSO_GETTIMEOFDAY_H */
diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
index ef28cc5d7bff..2d1c1f241fd9 100644
--- a/lib/vdso/gettimeofday.c
+++ b/lib/vdso/gettimeofday.c
@@ -26,6 +26,18 @@
#include <asm/vdso/gettimeofday.h>
#endif /* ENABLE_COMPAT_VDSO */

+#ifndef vdso_calc_delta
+/*
+ * Default implementation which works for all sane clocksources. That
+ * obviously excludes x86/TSC.
+ */
+static __always_inline
+u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
+{
+ return ((cycles - last) & mask) * mult;
+}
+#endif
+
static int do_hres(const struct vdso_data *vd, clockid_t clk,
struct __kernel_timespec *ts)
{
@@ -35,14 +47,13 @@ static int do_hres(const struct vdso_data *vd, clockid_t clk,

do {
seq = vdso_read_begin(vd);
- cycles = __arch_get_hw_counter(vd->clock_mode) &
- vd->mask;
+ cycles = __arch_get_hw_counter(vd->clock_mode);
ns = vdso_ts->nsec;
last = vd->cycle_last;
if (unlikely((s64)cycles < 0))
return clock_gettime_fallback(clk, ts);
- if (cycles > last)
- ns += (cycles - last) * vd->mult;
+
+ ns += vdso_calc_delta(cycles, last, vd->mask, vd->mult);
ns >>= vd->shift;
sec = vdso_ts->sec;
} while (unlikely(vdso_read_retry(vd, seq)));