Re: [PATCH v3 21/26] arm64: Introduce asm/vdso/arch_timer.h

From: Mark Rutland
Date: Mon Mar 16 2020 - 06:28:54 EST


Hi Vincenzo,

On Fri, Mar 13, 2020 at 03:43:40PM +0000, Vincenzo Frascino wrote:
> The vDSO library should only include the necessary headers required for
> a userspace library (UAPI and a minimal set of kernel headers). To make
> this possible it is necessary to isolate from the kernel headers the
> common parts that are strictly necessary to build the library.
>
> Introduce asm/vdso/arch_timer.h to contain all the arm64 specific
> code. This allows to replace the second isb() in __arch_get_hw_counter()
> with a fake dependent stack read of the counter which improves the vdso
> library peformances of ~4.5%. Below the results of vdsotest [1] ran for
> 100 iterations.
>
> Before the patch:
> =================
> clock-gettime-monotonic: syscall: 771 nsec/call
> clock-gettime-monotonic: libc: 130 nsec/call
> clock-gettime-monotonic: vdso: 111 nsec/call
> ...
> clock-gettime-realtime: syscall: 762 nsec/call
> clock-gettime-realtime: libc: 130 nsec/call
> clock-gettime-realtime: vdso: 111 nsec/call
>
> After the patch:
> ================
> clock-gettime-monotonic: syscall: 792 nsec/call
> clock-gettime-monotonic: libc: 124 nsec/call
> clock-gettime-monotonic: vdso: 106 nsec/call
> ...
> clock-gettime-realtime: syscall: 776 nsec/call
> clock-gettime-realtime: libc: 124 nsec/call
> clock-gettime-realtime: vdso: 106 nsec/call
>
> [1] https://github.com/nathanlynch/vdsotest
>
> Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
> Cc: Will Deacon <will@xxxxxxxxxx>
> Cc: Marc Zyngier <maz@xxxxxxxxxx>
> Cc: Mark Rutland <Mark.Rutland@xxxxxxx>
> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@xxxxxxx>
> ---
> arch/arm64/include/asm/arch_timer.h | 29 ++++---------------
> arch/arm64/include/asm/vdso/arch_timer.h | 33 ++++++++++++++++++++++
> arch/arm64/include/asm/vdso/gettimeofday.h | 7 +++--
> 3 files changed, 42 insertions(+), 27 deletions(-)
> create mode 100644 arch/arm64/include/asm/vdso/arch_timer.h
>
> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> index 7ae54d7d333a..7f22cd00ad45 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -164,24 +164,7 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
> isb();
> }
>
> -/*
> - * Ensure that reads of the counter are treated the same as memory reads
> - * for the purposes of ordering by subsequent memory barriers.
> - *
> - * This insanity brought to you by speculative system register reads,
> - * out-of-order memory accesses, sequence locks and Thomas Gleixner.
> - *
> - * http://lists.infradead.org/pipermail/linux-arm-kernel/2019-February/631195.html
> - */
> -#define arch_counter_enforce_ordering(val) do { \
> - u64 tmp, _val = (val); \
> - \
> - asm volatile( \
> - " eor %0, %1, %1\n" \
> - " add %0, sp, %0\n" \
> - " ldr xzr, [%0]" \
> - : "=r" (tmp) : "r" (_val)); \
> -} while (0)
> +#include <asm/vdso/arch_timer.h>
>
> static __always_inline u64 __arch_counter_get_cntpct_stable(void)
> {
> @@ -189,7 +172,7 @@ static __always_inline u64 __arch_counter_get_cntpct_stable(void)
>
> isb();
> cnt = arch_timer_reg_read_stable(cntpct_el0);
> - arch_counter_enforce_ordering(cnt);
> + cnt = arch_counter_enforce_ordering(cnt);
> return cnt;

Why have you changed the structure of arch_counter_enforce_ordering() to
return a value? The commit message has no rationale for that.

If there is a reason to change that, I'd prefer the driver change as one
patch, before moving the definition.

[...]

> +/*
> + * Ensure that reads of the counter are treated the same as memory reads
> + * for the purposes of ordering by subsequent memory barriers.
> + *
> + * This insanity brought to you by speculative system register reads,
> + * out-of-order memory accesses, sequence locks and Thomas Gleixner.
> + *
> + * http://lists.infradead.org/pipermail/linux-arm-kernel/2019-February/631195.html
> + *
> + */
> +static u64 arch_counter_enforce_ordering(u64 val)
> +{
> + u64 tmp, _val = (val);
> +
> + asm volatile(
> + " eor %0, %1, %1\n"
> + " add %0, sp, %0\n"
> + " ldr xzr, [%0]"
> + : "=r" (tmp) : "r" (_val));
> +
> + return _val;
> +}

This change has no functional effect. Since `_val` is only passed in as
an input parameter, the compiler can assume the assembly has no effect
on it.

As above, what is the rationale for changing this?

> @@ -82,10 +83,10 @@ static __always_inline u64 __arch_get_hw_counter(s32 clock_mode)
> isb();
> asm volatile("mrs %0, cntvct_el0" : "=r" (res) :: "memory");
> /*
> - * This isb() is required to prevent that the seq lock is
> - * speculated.#
> + * arch_counter_enforce_ordering() is required to prevent that
> + * the seq lock is speculated.
> */
> - isb();
> + res = arch_counter_enforce_ordering(res);

Can we delete the comment entirely? We don't bother in <asm/arch_timer.h>.

Even better, can we factor out __arch_counter_get_cntvct(), and use
that?

Thanks,
Mark.