Re: [PATCH tip/core/rcu 4/5] sys_membarrier: Add expedited option

From: Paul E. McKenney
Date: Tue Jul 25 2017 - 12:48:47 EST


On Tue, Jul 25, 2017 at 01:21:08PM +0000, Mathieu Desnoyers wrote:
> ----- On Jul 24, 2017, at 5:58 PM, Paul E. McKenney paulmck@xxxxxxxxxxxxxxxxxx wrote:
>
> > The sys_membarrier() system call has proven too slow for some use
> > cases, which has prompted users to instead rely on TLB shootdown.
> > Although TLB shootdown is much faster, it has the slight disadvantage
> > of not working at all on arm and arm64. This commit therefore adds
> > an expedited option to the sys_membarrier() system call.
>
> Is this now possible because the synchronize_sched_expedited()
> implementation does not require to send IPIs to all CPUS ? I
> suspect that using tree srcu now solves this somehow, but can
> you tell us a bit more about why it is now OK to expose this
> to user-space ?

I have gotten complaints from several users that sys_membarrier() is too
slow to be useful for them. So they are hacking around this problem by
unmapping a region of memory, thus getting the IPIs and memory barriers
on all CPUs, but with additional mm overhead. Plus this is non-portable,
and fragile with respect to reasonable optimizations, as was discussed
on LKML some time back:

https://marc.info/?l=linux-kernel&m=142619683526482

So we really need to make sys_membarrier() work for these users.
If we don't, we certainly will look quite silly criticizing their
use of invoking TLB shootdown via unmapping, now won't we?

Now back in 2015, expedited grace periods were horribly slow, but
I have optimized them to the point that it should be no worse than
TLB shootdown IPIs. Plus it is portable, and not subject to death
by optimization.

> The commit message here does not explain why it is OK real-time
> wise to expose this feature as a system call.

I figure that kernels providing that level of real-time response
will disable this, perhaps in a manner similar to that for NO_HZ_FULL.

Plus I intend to add your earlier IPI-all-threads-in-this-process
option, which will allow the people asking for this to do reasonable
testing.

Obviously, unless there are good test results and some level of user
enthusiasm, this patch goes nowhere.

Seem reasonable?

Thanx, Paul

> Thanks,
>
> Mathieu
>
>
> >
> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> > ---
> > include/uapi/linux/membarrier.h | 11 +++++++++++
> > kernel/membarrier.c | 7 ++++++-
> > 2 files changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
> > index e0b108bd2624..ba36d8a6be61 100644
> > --- a/include/uapi/linux/membarrier.h
> > +++ b/include/uapi/linux/membarrier.h
> > @@ -40,6 +40,16 @@
> > * (non-running threads are de facto in such a
> > * state). This covers threads from all processes
> > * running on the system. This command returns 0.
> > + * @MEMBARRIER_CMD_SHARED_EXPEDITED: Execute a memory barrier on all
> > + * running threads, but in an expedited fashion.
> > + * Upon return from system call, the caller thread
> > + * is ensured that all running threads have passed
> > + * through a state where all memory accesses to
> > + * user-space addresses match program order between
> > + * entry to and return from the system call
> > + * (non-running threads are de facto in such a
> > + * state). This covers threads from all processes
> > + * running on the system. This command returns 0.
> > *
> > * Command to be passed to the membarrier system call. The commands need to
> > * be a single bit each, except for MEMBARRIER_CMD_QUERY which is assigned to
> > @@ -48,6 +58,7 @@
> > enum membarrier_cmd {
> > MEMBARRIER_CMD_QUERY = 0,
> > MEMBARRIER_CMD_SHARED = (1 << 0),
> > + MEMBARRIER_CMD_SHARED_EXPEDITED = (2 << 0),
> > };
> >
> > #endif /* _UAPI_LINUX_MEMBARRIER_H */
> > diff --git a/kernel/membarrier.c b/kernel/membarrier.c
> > index 9f9284f37f8d..b749c39bb219 100644
> > --- a/kernel/membarrier.c
> > +++ b/kernel/membarrier.c
> > @@ -22,7 +22,8 @@
> > * Bitmask made from a "or" of all commands within enum membarrier_cmd,
> > * except MEMBARRIER_CMD_QUERY.
> > */
> > -#define MEMBARRIER_CMD_BITMASK (MEMBARRIER_CMD_SHARED)
> > +#define MEMBARRIER_CMD_BITMASK (MEMBARRIER_CMD_SHARED | \
> > + MEMBARRIER_CMD_SHARED_EXPEDITED)
> >
> > /**
> > * sys_membarrier - issue memory barriers on a set of threads
> > @@ -64,6 +65,10 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
> > if (num_online_cpus() > 1)
> > synchronize_sched();
> > return 0;
> > + case MEMBARRIER_CMD_SHARED_EXPEDITED:
> > + if (num_online_cpus() > 1)
> > + synchronize_sched_expedited();
> > + return 0;
> > default:
> > return -EINVAL;
> > }
> > --
> > 2.5.2
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
>