Re: [PATCH v14 for 4.1] sys_membarrier(): system-wide memory barrier (x86)

From: Mathieu Desnoyers
Date: Tue Apr 14 2015 - 15:23:04 EST


----- Original Message -----
> ----- Original Message -----
> > On Mon, 13 Apr 2015, Mathieu Desnoyers wrote:
> >
> > > [ Andrew, can you take this for the 4.1 merge window ? ]
> >
> > You probably mean 4.2, right?
>
> It can wait for 4.2 if you think it still need some work,
> of course. I was submitting it for 4.1 because I thought
> the discussion had settled down in the prior rounds.
>
> >
> > This fails the basic test for exposure in linux-next, adds syscalls
> > without the explicit ack of any x86 maintainer and exposes a user
> > space ABI with a magic undocumented flags argument.
>
> I guess for the undocumented part we need to add a manpage,
> right ?
>
> >
> > > This patch only adds the system call to x86.
> >
> > So the changes to
> >
> > > include/uapi/asm-generic/unistd.h | 4 +-
> >
> > are just cosmetic, right?
>
> Hrm, right, those are not. I did not get the full impact
> of adding it into this generic header. This means arc, arm64,
> c6x, hexagon, metag, nios2, openrisc, score, tile,
> unicore32 are also getting this system call.
>
> So the question would be: should we introduce this syscall
> in different patches for each architecture, or should
> we add them all in one go ? There is nothing fundamentally
> x86-specific to the implementation of this system call.
>
> >
> > > +/* System call membarrier "flags" argument. */
> > > +enum {
> > > + /*
> > > + * Query whether the rest of the specified flags are supported,
> > > + * without performing synchronization.
> > > + */
> >
> > Docbook has support for enums.
>
> OK, how about the following ?
>
> /**
> * enum membarrier_flags - membarrier system call "flags" argument bitmask.
> *
> * Bitmask of flags to be passed to the membarrier system call. The "flags"
> * parameter can be either 0 or many of those flags combined with a bitwise
> * "or".
> */
>
> >
> > > + MEMBARRIER_QUERY = (1 << 31),
> > > +};
> >
> > Why is this an anonymous enum?
>
> Because I used a "int" type for the syscall flags argument, so I did
> not need to name it. Giving it a name does indeed help making the
> documentation clearer anyway, so I'm tempted to give it a name.
>
> >
> > > +#ifdef CONFIG_SMP
> >
> > So documentation is SMP only, right?
>
> Yeah, those embedded folks still doing UP don't need documentation. ;-)
>
> I'll fix it.
>
> >
> > > +/*
> >
> > Docbook comments start with "/**"
>
> Fixed.
>
> >
> > > + * sys_membarrier - issue memory barrier on all running threads
> > > + * @flags: MEMBARRIER_QUERY:
> > > + * Query whether the rest of the specified flags are
> > > supported,
> > > + * without performing synchronization.
> >
> > @flags: Explain what this is for, not what a particular implemented
> > value is used for. The values should be proper documented
> > in the enum
> >
> > Why is this an int and not a named enum?
>
> In my reading of C99:
>
> ISO/IEC 9899:TC2
> 6.7.2.2 Enumeration specifiers
>
> "4 Each enumerated type shall be compatible with char, a signed integer type,
> or an
> unsigned integer type. The choice of type is implementation-defined,108) but
> shall be
> capable of representing the values of all the members of the enumeration. The
> enumerated type is incomplete until after the } that terminates the list of
> enumerator
> declarations."
>
> Since this type is used as a system call ABI, I'm worried that a compiler
> implementation may compile a user-space program with a enum representation
> of a "char", whereas the kernel would expect an integer, thus causing an
> ABI issue, where userland would fail to clear the high-order bits of the
> register, and a more recent kernel would care about those bits (if we add
> new flags in future kernel releases).
>
> >
> > Why is this named flags and not given a descriptive name? If I
> > understand your changelog correctly you want to implement other
> > synchronization modes than the current synchronize_sched. So mode
> > might be a proper name.
>
> If we look at other system calls like open(), "flags" describes a
> feature configuration, whereas "mode" is used to specify the
> permissions to use when creating the file. What we want to specify
> here is more a configuration of the syscall, which fits better with
> the existing semantic of open() "flags".
>
> >
> > Why is MEMBARRIER_QUERY not a proper operation mode and simply returns
> > the supported modes instead of doing it backwards and asking whether
> > a specific value is supported?
>
> That's an interesting way to do it. We'd have to be careful not to
> conflict with return values like -ENOSYS then. We could define
> MEMBARRIER_QUERY as (1 << 31), which effectively reserves all
> signed negative numbers for the QUERY flag, and then the return value
> of the QUERY flag could be either a negative value (e.g. -ENOSYS)
> or the set of flags supported (bits 0 to 30).

Thinking about it a bit more, one reason for doing the QUERY along
with the exact set of flags queried allow us to do more than just
returning which flags are supported: it allows us to tell userspace
whether the combination of flags used is valid or not.

For instance, if we add a MEMBARRIER_PRIVATE flag in a future release
to issue memory barriers only to other threads from the same process,
and we add a MEMBARRIER_EXPEDITED which uses IPIs to issue those
barriers, we could very well have a situation where using

EXPEDITED | PRIVATE would be valid (only sending IPIs to CPUs
running threads from the same process)

but

EXPEDITED alone would be invalid (-EINVAL), until we figure out
how to expedite memory barriers to all processors without impacting
other processes, if at all possible.

Using QUERY with an empty set of flags could however return the set of
flags supported, which could be a nice feature. Anyway, I think
the "0" flag should be the basic always correct configuration that
is always supported, otherwise we'd have -ENOSYS. Therefore, querying
whether the empty set of flags is supported has little value, other
than checking for -ENOSYS.

So considering the above, the typical use of this query method from
library initialization would be:

int supported_flags = sys_membarrier(MEMBARRIER_QUERY);

... check for -ENOSYS ....
... check whether the flags we need are supported ...

if (sys_membarrier(MEMBARRIER_QUERY | flag1 | flag2))
goto error;

then we are guaranteed that using sys_membarrier(flag1 | flag2)
will always succeed within the application, without needing to
handle errors every time it is used. This property is useful
to implement a synchronize_rcu() that returns "void" and simplify
error handling within the application.

Thoughts ?

Thanks,

Mathieu


>
> >
> > > + * On uniprocessor systems, this system call simply returns 0 after
> > > + * validating the arguments, so user-space knows it is implemented.
> >
> > And the exact point of knowing this is?
>
> I guess the alternative approach would be to return -ENOSYS, and let
> userspace discover that the kernel only support uniprocessor systems
> by other means ?
>
> >
> > > + */
> > > +SYSCALL_DEFINE1(membarrier, int, flags)
> > > +{
> > > + int retval;
> > > +
> > > + retval = membarrier_validate_flags(flags);
> > > + if (retval)
> > > + goto end;
> > > + if (unlikely(flags & MEMBARRIER_QUERY) || num_online_cpus() == 1)
> > > + goto end;
> >
> > So why not doing the obvious?
> >
> > enum modes {
> > QUERY = 0,
> > FULL_SYNC = 1 << 0,
> > };
> >
> > #define IMPLEMENTED_MODES (FULL_SYNC)
> >
> > switch (mode) {
> > case FULL_SYNC:
> > synchronize_sched_on_smp();
> > return 0;
> > case QUERY:
> > return IMPLEMENTED_MODES;
> > }
> > return -EINVAL;
> >
> > And later on you do
> >
> > enum modes {
> > QUERY = 0,
> > FULL_SYNC = 1 << 0,
> > + MAGIC_SYNC = 1 << 1,
> > };
> >
> > -#define IMPLEMENTED_MODES (FULL_SYNC)
> > +#define IMPLEMENTED_MODES (FULL_SYNC | MAGIC_SYNC)
> >
> > All you need to make sure is that any mode is a power of 2.
>
> I really prefer that the "default" of passing "0" does
> the obviously correct behavior for the system call:
> issuing a memory barrier over the entire system. Otherwise,
> making "0" a no-op could introduce hard-to-find races
> if people misuse the system call. Moreover, we need to
> reserve negative return values for errors like -ENOSYS, so
> we can use the high-order bit for the QUERY flag.
>
> In summary, how about the following:
>
> - "0" is the default, obviously correct flags parameter,
> which does a memory barrier on all processors.
> - 1 << 31 is the QUERY flag, which returns a bitmask of
> all flags supported.
> - We keep negative return values reserved for errors
> (e.g. -ENOSYS), even for the QUERY flag.
>
> Thanks for the feedback!
>
> Mathieu
>
>
>
> >
> > Hmm?
> >
> > Thanks,
> >
> > tglx
> >
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
>

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/