Re: [PATCH -tip] introduce sys_membarrier(): process-wide memorybarrier (v9)

From: Josh Triplett
Date: Tue Mar 02 2010 - 20:53:53 EST


On Tue, Mar 02, 2010 at 06:07:10PM -0500, Mathieu Desnoyers wrote:
> * Josh Triplett (josh@xxxxxxxxxxxxxxxx) wrote:
> > On Thu, Feb 25, 2010 at 06:23:16PM -0500, Mathieu Desnoyers wrote:
> > > I am proposing this patch for the 2.6.34 merge window, as I think it is ready
> > > for inclusion.
> > >
> > > Here is an implementation of a new system call, sys_membarrier(), which
> > > executes a memory barrier on all threads of the current process.
> > [...]
> >
> > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
> > > Acked-by: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>
> > > Acked-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
> > > Acked-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> > > CC: Nicholas Miell <nmiell@xxxxxxxxxxx>
> > > CC: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> > > CC: mingo@xxxxxxx
> > > CC: laijs@xxxxxxxxxxxxxx
> > > CC: dipankar@xxxxxxxxxx
> > > CC: akpm@xxxxxxxxxxxxxxxxxxxx
> > > CC: josh@xxxxxxxxxxxxxxxx
> >
> > Acked-by: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
> >
> > I agree that v9 seems ready for inclusion.
>
> Thanks!
>
> >
> > Out of curiosity, do you have any benchmarks for the case of not
> > detecting sys_membarrier dynamically? Detecting it at library
> > initialization time, for instance, or even just compiling to assume its
> > presence? I'd like to know how much that would improve the numbers.
>
> Citing the patch changelog:
>
> Results in liburcu:
>
> Operations in 10s, 6 readers, 2 writers:
>
> (what we previously had)
> memory barriers in reader: 973494744 reads, 892368 writes
> signal-based scheme: 6289946025 reads, 1251 writes
>
> (what we have now, with dynamic sys_membarrier check, expedited scheme)
> memory barriers in reader: 907693804 reads, 817793 writes
> sys_membarrier scheme: 4316818891 reads, 503790 writes
>
> So basically, yes, there is a significant overhead on the read-side if we
> compare the dynamic check (0.39 ns/read per reader) to the signal-based scheme
> (0.26 ns/read per reader) (which only needs the barrier()). On the update-side,
> we cannot care less though.

Just wanted to confirm that the signal results also hold for the
assume-sys_membarrier approach.

> > If significant, it might make sense to try to have a mechanism similar
> > to SMP alternatives, to have different code in either case. dlopen,
> > function pointers, runtime code patching (nop out the rmb), or similar.
>
> Yes, definitely. It could also be useful to switch between UP and SMP primitives
> dynamically when spawning the second thread in a process. We should be careful
> when sharing memory maps between processes though.

Might prove useful for some use cases, sure. Not a high priority given
complexity:performance ratio though, I think.

- Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/