Re: Rough notes from sys_membarrier() lightning BoF

From: Mathieu Desnoyers
Date: Wed Sep 20 2017 - 14:13:04 EST


----- On Sep 20, 2017, at 12:02 PM, Andy Lutomirski luto@xxxxxxxxxx wrote:

> On Sun, Sep 17, 2017 at 3:36 PM, Paul E. McKenney
> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>> Hello!
>>
>> Rough notes from our discussion last Thursday. Please reply to the
>> group with any needed elaborations or corrections.
>>
>> Adding Andy and Michael on CC since this most closely affects their
>> architectures. Also adding Dave Watson and Maged Michael because
>> the preferred approach requires that processes wanting to use the
>> lightweight sys_membarrier() do a registration step.
>
> Not to be too much of a curmudgeon, but I think that there should be a
> real implementation of the isync membarrier before this get merged.
> This series purports to solve two problems, ppc barriers and x86
> exit-without-isync, but it's very hard to evaluate whether it actually
> solves the latter problem given the complete lack of x86 or isync code
> in the current RFC.
>
> It still seems to me that you won't get any particular advantage for
> using this registration mechanism on x86 even when you implement
> isync. Unless I've misunderstood, the only real issue on x86 is that
> you need a helper like arch_force_isync_before_usermode(), and that
> helper doesn't presently exist. That means that this whole patchset
> is standing on very dangerous ground: you'll end up with an efficient
> implementation that works just fine without even requesting
> registration on every architecture except ppc. That way lies
> userspace bugs.

My proposed RFC for private expedited membarrier enforces that all
architectures perform the registration step. Using the "PRIVATE_EXPEDITED"
command without prior process registration returns an error on all
architectures. The goal here is to make all architectures behave in the
same way, and it allows us to rely on process registration to deal
with future arch-specific optimizations.

Adding the "core_sync" behavior could then be done for the next kernel
merge window. I'm currently foreseeing two possible ABI approaches to
expose it:

Approach 1:

Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE and
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE commands. This
allows us to return their availability through MEMBARRIER_CMD_QUERY.

Approach 2:

Add a "MEMBARRIER_FLAG_SYNC_CORE" as flag parameter. It could be set
when issuing both MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED and
MEMBARRIER_CMD_PRIVATE_EXPEDITED, thus ensuring core serializing
behavior. Querying whether core serialization is supported could
be done by issuing the MEMBARRIER_CMD_QUERY command with the
MEMBARRIER_FLAG_SYNC_CORE flag set.

Any other ideas ? Any approach seems better ?

>
> Also, can you elaborate on the PPC issue? PPC appears to track
> mm_cpumask more or less just like x86. Is the issue just that this
> tracking has no implied barriers? If so, how does TLB flush on ppc
> work? It really does seem impressive to me that an architecture can
> efficiently support munmap() but not an expedited private membarrier.

I'll leave this question to the PPC experts :)

Thanks,

Mathieu

>
> --Andy

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com