Re: Rough notes from sys_membarrier() lightning BoF

From: Andy Lutomirski
Date: Wed Sep 20 2017 - 14:19:14 EST


On Wed, Sep 20, 2017 at 11:13 AM, Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>
> ----- On Sep 20, 2017, at 12:02 PM, Andy Lutomirski luto@xxxxxxxxxx wrote:
>
> > On Sun, Sep 17, 2017 at 3:36 PM, Paul E. McKenney
> > <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> >> Hello!
> >>
> >> Rough notes from our discussion last Thursday. Please reply to the
> >> group with any needed elaborations or corrections.
> >>
> >> Adding Andy and Michael on CC since this most closely affects their
> >> architectures. Also adding Dave Watson and Maged Michael because
> >> the preferred approach requires that processes wanting to use the
> >> lightweight sys_membarrier() do a registration step.
> >
> > Not to be too much of a curmudgeon, but I think that there should be a
> > real implementation of the isync membarrier before this get merged.
> > This series purports to solve two problems, ppc barriers and x86
> > exit-without-isync, but it's very hard to evaluate whether it actually
> > solves the latter problem given the complete lack of x86 or isync code
> > in the current RFC.
> >
> > It still seems to me that you won't get any particular advantage for
> > using this registration mechanism on x86 even when you implement
> > isync. Unless I've misunderstood, the only real issue on x86 is that
> > you need a helper like arch_force_isync_before_usermode(), and that
> > helper doesn't presently exist. That means that this whole patchset
> > is standing on very dangerous ground: you'll end up with an efficient
> > implementation that works just fine without even requesting
> > registration on every architecture except ppc. That way lies
> > userspace bugs.
>
> My proposed RFC for private expedited membarrier enforces that all
> architectures perform the registration step. Using the "PRIVATE_EXPEDITED"
> command without prior process registration returns an error on all
> architectures. The goal here is to make all architectures behave in the
> same way, and it allows us to rely on process registration to deal
> with future arch-specific optimizations.

Fair enough.

That being said, on same architectures (which may well be all but
PPC), it might be nice if the registration call literally just sets a
flag in the mm saying that it happened so that the registration
enforcement can be done.

>
>
> Adding the "core_sync" behavior could then be done for the next kernel
> merge window. I'm currently foreseeing two possible ABI approaches to
> expose it:
>
> Approach 1:
>
> Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE and
> MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE commands. This
> allows us to return their availability through MEMBARRIER_CMD_QUERY.
>
> Approach 2:
>
> Add a "MEMBARRIER_FLAG_SYNC_CORE" as flag parameter. It could be set
> when issuing both MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED and
> MEMBARRIER_CMD_PRIVATE_EXPEDITED, thus ensuring core serializing
> behavior. Querying whether core serialization is supported could
> be done by issuing the MEMBARRIER_CMD_QUERY command with the
> MEMBARRIER_FLAG_SYNC_CORE flag set.
>
> Any other ideas ? Any approach seems better ?


It doesn't seem to make much difference to me.