Re: [PATCH v14 for 4.1] sys_membarrier(): system-wide memory barrier (x86)

From: Mathieu Desnoyers
Date: Tue Apr 14 2015 - 15:02:30 EST


----- Original Message -----
> On Mon, 13 Apr 2015, Mathieu Desnoyers wrote:
>
> > [ Andrew, can you take this for the 4.1 merge window ? ]
>
> You probably mean 4.2, right?

It can wait for 4.2 if you think it still need some work,
of course. I was submitting it for 4.1 because I thought
the discussion had settled down in the prior rounds.

>
> This fails the basic test for exposure in linux-next, adds syscalls
> without the explicit ack of any x86 maintainer and exposes a user
> space ABI with a magic undocumented flags argument.

I guess for the undocumented part we need to add a manpage,
right ?

>
> > This patch only adds the system call to x86.
>
> So the changes to
>
> > include/uapi/asm-generic/unistd.h | 4 +-
>
> are just cosmetic, right?

Hrm, right, those are not. I did not get the full impact
of adding it into this generic header. This means arc, arm64,
c6x, hexagon, metag, nios2, openrisc, score, tile,
unicore32 are also getting this system call.

So the question would be: should we introduce this syscall
in different patches for each architecture, or should
we add them all in one go ? There is nothing fundamentally
x86-specific to the implementation of this system call.

>
> > +/* System call membarrier "flags" argument. */
> > +enum {
> > + /*
> > + * Query whether the rest of the specified flags are supported,
> > + * without performing synchronization.
> > + */
>
> Docbook has support for enums.

OK, how about the following ?

/**
* enum membarrier_flags - membarrier system call "flags" argument bitmask.
*
* Bitmask of flags to be passed to the membarrier system call. The "flags"
* parameter can be either 0 or many of those flags combined with a bitwise
* "or".
*/

>
> > + MEMBARRIER_QUERY = (1 << 31),
> > +};
>
> Why is this an anonymous enum?

Because I used a "int" type for the syscall flags argument, so I did
not need to name it. Giving it a name does indeed help making the
documentation clearer anyway, so I'm tempted to give it a name.

>
> > +#ifdef CONFIG_SMP
>
> So documentation is SMP only, right?

Yeah, those embedded folks still doing UP don't need documentation. ;-)

I'll fix it.

>
> > +/*
>
> Docbook comments start with "/**"

Fixed.

>
> > + * sys_membarrier - issue memory barrier on all running threads
> > + * @flags: MEMBARRIER_QUERY:
> > + * Query whether the rest of the specified flags are
> > supported,
> > + * without performing synchronization.
>
> @flags: Explain what this is for, not what a particular implemented
> value is used for. The values should be proper documented
> in the enum
>
> Why is this an int and not a named enum?

In my reading of C99:

ISO/IEC 9899:TC2
6.7.2.2 Enumeration specifiers

"4 Each enumerated type shall be compatible with char, a signed integer type, or an
unsigned integer type. The choice of type is implementation-defined,108) but shall be
capable of representing the values of all the members of the enumeration. The
enumerated type is incomplete until after the } that terminates the list of enumerator
declarations."

Since this type is used as a system call ABI, I'm worried that a compiler
implementation may compile a user-space program with a enum representation
of a "char", whereas the kernel would expect an integer, thus causing an
ABI issue, where userland would fail to clear the high-order bits of the
register, and a more recent kernel would care about those bits (if we add
new flags in future kernel releases).

>
> Why is this named flags and not given a descriptive name? If I
> understand your changelog correctly you want to implement other
> synchronization modes than the current synchronize_sched. So mode
> might be a proper name.

If we look at other system calls like open(), "flags" describes a
feature configuration, whereas "mode" is used to specify the
permissions to use when creating the file. What we want to specify
here is more a configuration of the syscall, which fits better with
the existing semantic of open() "flags".

>
> Why is MEMBARRIER_QUERY not a proper operation mode and simply returns
> the supported modes instead of doing it backwards and asking whether
> a specific value is supported?

That's an interesting way to do it. We'd have to be careful not to
conflict with return values like -ENOSYS then. We could define
MEMBARRIER_QUERY as (1 << 31), which effectively reserves all
signed negative numbers for the QUERY flag, and then the return value
of the QUERY flag could be either a negative value (e.g. -ENOSYS)
or the set of flags supported (bits 0 to 30).

>
> > + * On uniprocessor systems, this system call simply returns 0 after
> > + * validating the arguments, so user-space knows it is implemented.
>
> And the exact point of knowing this is?

I guess the alternative approach would be to return -ENOSYS, and let
userspace discover that the kernel only support uniprocessor systems
by other means ?

>
> > + */
> > +SYSCALL_DEFINE1(membarrier, int, flags)
> > +{
> > + int retval;
> > +
> > + retval = membarrier_validate_flags(flags);
> > + if (retval)
> > + goto end;
> > + if (unlikely(flags & MEMBARRIER_QUERY) || num_online_cpus() == 1)
> > + goto end;
>
> So why not doing the obvious?
>
> enum modes {
> QUERY = 0,
> FULL_SYNC = 1 << 0,
> };
>
> #define IMPLEMENTED_MODES (FULL_SYNC)
>
> switch (mode) {
> case FULL_SYNC:
> synchronize_sched_on_smp();
> return 0;
> case QUERY:
> return IMPLEMENTED_MODES;
> }
> return -EINVAL;
>
> And later on you do
>
> enum modes {
> QUERY = 0,
> FULL_SYNC = 1 << 0,
> + MAGIC_SYNC = 1 << 1,
> };
>
> -#define IMPLEMENTED_MODES (FULL_SYNC)
> +#define IMPLEMENTED_MODES (FULL_SYNC | MAGIC_SYNC)
>
> All you need to make sure is that any mode is a power of 2.

I really prefer that the "default" of passing "0" does
the obviously correct behavior for the system call:
issuing a memory barrier over the entire system. Otherwise,
making "0" a no-op could introduce hard-to-find races
if people misuse the system call. Moreover, we need to
reserve negative return values for errors like -ENOSYS, so
we can use the high-order bit for the QUERY flag.

In summary, how about the following:

- "0" is the default, obviously correct flags parameter,
which does a memory barrier on all processors.
- 1 << 31 is the QUERY flag, which returns a bitmask of
all flags supported.
- We keep negative return values reserved for errors
(e.g. -ENOSYS), even for the QUERY flag.

Thanks for the feedback!

Mathieu



>
> Hmm?
>
> Thanks,
>
> tglx
>

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/