Re: Rough notes from sys_membarrier() lightning BoF

From: Mathieu Desnoyers
Date: Mon Sep 18 2017 - 15:09:37 EST


----- On Sep 18, 2017, at 3:04 PM, Alan Stern stern@xxxxxxxxxxxxxxxxxxx wrote:

> On Sun, 17 Sep 2017, Paul E. McKenney wrote:
>
>> Hello!
>>
>> Rough notes from our discussion last Thursday. Please reply to the
>> group with any needed elaborations or corrections.
>>
>> Adding Andy and Michael on CC since this most closely affects their
>> architectures. Also adding Dave Watson and Maged Michael because
>> the preferred approach requires that processes wanting to use the
>> lightweight sys_membarrier() do a registration step.
>>
>> Thanx, Paul
>>
>> ------------------------------------------------------------------------
>>
>> Problem:
>>
>> 1. The current sys_membarrier() introduces an smp_mb() that
>> is not otherwise required on powerpc.
>>
>> 2. The envisioned JIT variant of sys_membarrier() assumes that
>> the return-to-user instruction sequence handling any change
>> to the usermode instruction stream, and Andy Lutomirski's
>> upcoming changes invalidate this assumption. It is believed
>> that powerpc has a similar issue.
>
>> E. Require that threads register before using sys_membarrier() for
>> private or JIT usage. (The historical implementation using
>> synchronize_sched() would continue to -not- require registration,
>> both for compatibility and because there is no need to do so.)
>>
>> For x86 and powerpc, this registration would set a TIF flag
>> on all of the current process's threads. This flag would be
>> inherited by any later thread creation within that process, and
>> would be cleared by fork() and exec(). When this TIF flag is set,
>
> Why a TIF flag, and why clear it during fork()? If a process registers
> to use private expedited sys_membarrier, shouldn't that apply to
> threads it will create in the future just as much as to threads it has
> already created?

In my implementation posted today, I'm not clearing it on fork. The child
inherits from the parent.

Why TIF flag ? It appears to be a convenient way to add an architecture-specific
single-bit state for each thread. We also don't want to do too much pointer
chasing on the scheduler fast-path (current->mm->..).

>
>> the return-to-user path would execute additional code that would
>> ensure that ordering and newly JITed code was handled correctly.
>> We believe that checks for these TIF flags could be combined with
>> existing checks to avoid adding any overhead in the common case
>> where the process was not using these sys_membarrier() features.
>>
>> For all other architecture, the registration step would be
>> a no-op.
>
> Don't we want to fail private expedited sys_membarrier calls if the
> process hasn't registered for them? This requires the registration
> call to set a flag for the process, even on architectures where no
> additional memory barriers are actually needed. It can't be a no-op.

My implementation posted today fails the private expedited command
if the process is not registered yet. We indeed add a new flag in
mm_struct for all architectures to do so.

So why not re-use this flag instead of the TIF on powerpc ? See my
pointer chasing on fast-path argument above.

Thanks,

Mathieu

>
> Alan Stern

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com