Re: [RFC PATCH v7 7/7] Restartable sequences: self-tests

From: Boqun Feng
Date: Fri Aug 12 2016 - 21:29:19 EST


On Fri, Aug 12, 2016 at 06:11:45PM +0000, Mathieu Desnoyers wrote:
> ----- On Aug 12, 2016, at 12:35 PM, Boqun Feng boqun.feng@xxxxxxxxx wrote:
>
> > On Fri, Aug 12, 2016 at 01:30:15PM +0800, Boqun Feng wrote:
> > [snip]
> >> > > Besides, do we allow userspace programs do read-only access to the
> >> > > memory objects modified by do_rseq(). If so, we have a problem when
> >> > > there are two writes in a do_rseq()(either in the rseq critical section
> >> > > or in the asm block), because in current implemetation, these two writes
> >> > > are unordered, which makes the readers outside a do_rseq() could observe
> >> > > the ordering of writes differently.
> >> > >
> >> > > For rseq_finish2(), a simple solution would be making the "final" write
> >> > > a RELEASE.
> >> >
> >> > Indeed, we would need a release semantic for the final store here if this
> >> > is the common use. Or we could duplicate the "flavors" of rseq_finish2 and
> >> > add a rseq_finish2_release. We should find a way to eliminate code duplication
> >>
> >> I'm in favor of a separate rseq_finish2_release().
> >>
> >> > there. I suspect we'll end up doing macros.
> >> >
> >>
> >> Me too. Lemme have a try ;-)
> >>
> >
> > How about this? Although a little messy, I separated the asm block into
> > several parts and implemented each part in a arch-diagnose way.
>
> I find it rather hard to follow the per-arch assembly with this approach.
> It might prove to be troublesome if we want to do arch-specific optimizations
> in the future.
>

It might be, but I was just trying to kill as much duplicate code as
possible, because the more duplicate we have, the more maintain effort
we need.

For example, PPC32 and PPC64 may have the same asm code to check the
event counter, but different code to do the final store. Having the
same RSEQ_CHECK_COUNTER() for PPC32 and PPC64 actually makes it easy if
we come up a way to optimize the counter check code on PPC.

And if some arch wants to have some very specifical optimizations,
it could always write the whole asm block again rather than use the
helpers macros.

> I've come up with the following macro approach instead, feedback welcome!
>
>
> commit 4d27431d6aefaee617540ef04518962b0e4d14f4
> Author: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
> Date: Thu Aug 11 19:11:27 2016 -0400
>
> rseq_finish2, rseq_finish2_release (WIP)
>
[...]
> +#elif defined(__ARMEL__)
> +
> +#define RSEQ_FINISH_ASM(_target_final, _to_write_final, _start_value, \
> + _failure, extra_store, extra_input) \
> + __asm__ __volatile__ goto ( \
> + ".pushsection __rseq_table, \"aw\"\n\t" \
> + ".balign 32\n\t" \
> + ".word 1f, 0x0, 2f, 0x0, %l[failure], 0x0, 0x0, 0x0\n\t" \
> + ".popsection\n\t" \
> + "1:\n\t" \
> + RSEQ_INJECT_ASM(1) \
> + "adr r0, 3f\n\t" \
> + "str r0, [%[rseq_cs]]\n\t" \
> + RSEQ_INJECT_ASM(2) \
> + "ldr r0, %[current_event_counter]\n\t" \
> + "mov r1, #0\n\t" \
> + "cmp %[start_event_counter], r0\n\t" \
> + "bne %l[failure]\n\t" \
> + RSEQ_INJECT_ASM(3) \
> + extra_store \
> + "str %[to_write_final], [%[target_final]]\n\t" \
> + "2:\n\t" \
> + RSEQ_INJECT_ASM(5) \
> + "str r1, [%[rseq_cs]]\n\t" \

I find this is a little weird here, that is having an extra register for
zeroing the rseq_cs. Could we

"mov r0, #0\n\t"
"str r0, [%[rseq_cs]]\n\t"

here? Which not only saves a register, but also an instruction "mov r1,
#0" in the fast path. Am I missing something subtle?

> + "b 4f\n\t" \
> + ".balign 32\n\t" \
> + "3:\n\t" \
> + ".word 1b, 0x0, 2b, 0x0, l[failure], 0x0, 0x0, 0x0\n\t" \
> + "4:\n\t" \
> + : /* no outputs */ \
> + : [start_event_counter]"r"((_start_value).event_counter), \
> + [current_event_counter]"m"((_start_value).rseqp->u.e.event_counter), \
> + [to_write_final]"r"(_to_write_final), \
> + [target_final]"r"(_target_final), \
> + [rseq_cs]"r"(&(_start_value).rseqp->rseq_cs) \
> + extra_input \
> + RSEQ_INJECT_INPUT \
> + : "r0", "r1", "memory", "cc" \
> + RSEQ_INJECT_CLOBBER \
> + : _failure \
> );

[...]

> +
> +#define RSEQ_FINISH2_RELEASE_SPECULATIVE_STORE_ASM() \
> + RSEQ_FINISH2_SPECULATIVE_STORE_ASM() \
> + "dmb\n\t"
> +

Having a RELEASE barrier here may be OK for all current archs we
support, but there are archs which rather than have a lightweight
RELEASE barrier but use a special instruction for RELEASE operations,
for example, AArch64. Do we need to take that into consideration and
define a RSEQ_FINISH_ASM_RELEASE() rather than a
RSEQ_FINISH2_SPECULATIVE_STORE_INPUT_ASM()?

[...]

Regards
Boqun

Attachment: signature.asc
Description: PGP signature