Re: [RFC] arm64: Enforce gettimeofday vdso structure read ordering

From: Mark Rutland
Date: Mon Aug 22 2016 - 07:38:00 EST


Hi,

On Fri, Aug 19, 2016 at 04:02:08PM -0400, Brent DeGraaf wrote:
> Introduce explicit control-flow logic immediately prior to virtual
> counter register read in all cases so that the mrs read will
> always be accessed after all vdso data elements are read and
> sequence count is verified. Ensure data elements under the
> protection provided by the sequence counter are read only within
> the protection logic loop. Read the virtual counter as soon as
> possible after the data elements are confirmed correctly read,
> rather than after several other operations which can affect time.
> Reduce full barriers required in register read code in favor of
> the lighter-weight one-way barrier supplied by a load-acquire
> wherever possible.

As I commented previously, can you please explain *why* these chagnes
should be made?

* What problem does this patch address?

* Is this a bug fix? If so, what problem can be seen currently?

* Is this an optimisation? If so, how much of an improvement can be
seen?

In the absence of answers to the above, this patch is impossible to
review, and will not get applied. It makes invasive changes, ans we need
a rationale for those.

I've made some comments below, but the above is key.

[...]

> + .macro seqdata_acquire fallback, tzonly=NO_TZ, skipvcnt=0, getdata
> +9999: ldar seqcnt, [vdso_data, #VDSO_TB_SEQ_COUNT]
> +8888: tbnz seqcnt, #0, 9999b
> ldr w_tmp, [vdso_data, #VDSO_USE_SYSCALL]
> - cbnz w_tmp, \fail
> + cbnz w_tmp, \fallback
> + \getdata
> + dmb ishld /* No loads from vdso_data after this point */

What ordering guarantee is the DMB attempting to provide? Given we have
the acquire, I assume some prior load, but I couldn't figure out what
specifically.

> + mov w9, seqcnt
> + ldar seqcnt, [vdso_data, #VDSO_TB_SEQ_COUNT]

Usually, acquire operations pair with a release operation elsewhere. What
does this pair with?

> + cmp w9, seqcnt
> + bne 8888b /* Do not needlessly repeat ldar and its implicit barrier */
> + .if (\tzonly) != NO_TZ
> + cbz x0, \tzonly
> + .endif
> + .if (\skipvcnt) == 0
> + isb
> + mrs x_tmp, cntvct_el0
> + .endif
> .endm

All this conitional code makes the callers somehwat painful to read.

It might be nicer to have this explicit in the calelrs that require it
rather than conditional in the macro.

>
> .macro get_nsec_per_sec res
> @@ -64,9 +70,6 @@ x_tmp .req x8
> * shift.
> */
> .macro get_clock_shifted_nsec res, cycle_last, mult
> - /* Read the virtual counter. */
> - isb
> - mrs x_tmp, cntvct_el0
> /* Calculate cycle delta and convert to ns. */
> sub \res, x_tmp, \cycle_last
> /* We can only guarantee 56 bits of precision. */
> @@ -137,17 +140,12 @@ x_tmp .req x8
> ENTRY(__kernel_gettimeofday)
> .cfi_startproc
> adr vdso_data, _vdso_data
> - /* If tv is NULL, skip to the timezone code. */
> - cbz x0, 2f
> -
> - /* Compute the time of day. */
> -1: seqcnt_acquire
> - syscall_check fail=4f
> - ldr x10, [vdso_data, #VDSO_CS_CYCLE_LAST]
> - /* w11 = cs_mono_mult, w12 = cs_shift */
> - ldp w11, w12, [vdso_data, #VDSO_CS_MONO_MULT]
> - ldp x13, x14, [vdso_data, #VDSO_XTIME_CLK_SEC]
> - seqcnt_check fail=1b
> + seqdata_acquire fallback=4f tzonly=2f getdata=__stringify(\
> + ldr x10, [vdso_data, #VDSO_CS_CYCLE_LAST];\
> + /* w11 = cs_mono_mult, w12 = cs_shift */;\
> + ldp w11, w12, [vdso_data, #VDSO_CS_MONO_MULT];\
> + ldp x13, x14, [vdso_data, #VDSO_XTIME_CLK_SEC];\
> + ldp w4, w5, [vdso_data, #VDSO_TZ_MINWEST])

Why do we need the stringify? Is that just so we can pass the code as a
macro parameter? If so, it really doesn't look like the way to go...

This is unfortunately painful to read.

Thanks,
Mark.