Re: [PATCH] uapi: futex: Add a futex syscall

From: Alistair Francis
Date: Sun Oct 17 2021 - 23:42:33 EST


On Fri, Oct 15, 2021 at 6:24 PM Arnd Bergmann <arnd@xxxxxxxx> wrote:
>
> On Fri, Oct 15, 2021 at 2:59 AM Alistair Francis
> <alistair.francis@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > From: Alistair Francis <alistair.francis@xxxxxxx>
> >
> > This commit adds two futex syscall wrappers that are exposed to
> > userspace.
> >
> > Neither the kernel or glibc currently expose a futex wrapper, so
> > userspace is left performing raw syscalls. This has mostly been becuase
> > the overloading of one of the arguments makes it impossible to provide a
> > single type safe function.
> >
> > Until recently the single syscall has worked fine. With the introduction
> > of a 64-bit time_t futex call on 32-bit architectures, this has become
> > more complex. The logic of handling the two possible futex syscalls is
> > complex and often implemented incorrectly.
> >
> > This patch adds two futux syscall functions that correctly handle the
> > time_t complexity for userspace.
> >
> > This idea is based on previous discussions: https://lkml.org/lkml/2021/9/21/143
> >
> > Signed-off-by: Alistair Francis <alistair.francis@xxxxxxx>
>
> Thanks a lot for taking this on!

No worries!

>
> The implementation looks good to me, I've just noted some details below.
>
> The big question here is the exact one that has kept the wrappers out of
> glibc and musl for now: What should the function prototype(s) be?
>
> I see you went with having two distinct function names for the two
> sets of argument types, which I think was what the glibc developers
> were leaning towards.
>
> I think it would be nicer to use a transparent union like
>
> typedef union __attribute__((__transparent_inion__)) {
> struct timespec *timeout;
> u_int32_t nr_requeue;
> } __futex_arg4;
>
> which would let us provide a single function for both variants.
> The main downside is that this relies on a GCC extension, but I
> don't expect that to be a problem since any code using this is
> already nonportable.

I don't love this. Relying on a GCC extension doesn't seem like a great idea.

As well as that we need to split the types out to check the timeout
value, and I'm not sure of how we could do that. We could have a
switch case on the futex op, but that seems likely to get out of sync.

>
> The other question here is what namespace to use. You went for the
> non-prefixed futex_syscall, which is clearly nicer to use in an application,
> but it might clash with libc providing the same function name in the
> future. We could consider using __kernel_futex() as the function name,
> putting it into a private namespace for the kernel headers that could
> then get wrapped again by an application or libc using it.

Sounds like a good idea, I'll update this.

>
> > +#include <errno.h>
> > +#include <linux/types.h>
> > +#include <linux/time_types.h>
>
> I wonder if this should also #include <asm/unistd.h> and
> <sys/syscall.h>, or if you
> can leave that to the caller to include either <asm/unistd.h.> or <unistd.h>.

I think the kernel test bot answered this question :)

I have added both.

>
> > +/**
> > + * futex_syscall_timeout() - __NR_futex/__NR_futex_time64 syscall wrapper
> > + * @uaddr: address of first futex
> > + * @op: futex op code
> > + * @val: typically expected value of uaddr, but varies by op
> > + * @timeout: an absolute struct timespec
> > + * @uaddr2: address of second futex for some ops
> > + * @val3: varies by op
> > + */
> > +static inline int
> > +futex_syscall_timeout(volatile u_int32_t *uaddr, int op, u_int32_t val,
> > + struct timespec *timeout, volatile u_int32_t *uaddr2, int val3)
> > +{
> > +#if defined(__NR_futex_time64)
> > + if (sizeof(*timeout) != sizeof(struct __kernel_old_timespec)) {
> > + int ret = syscall(__NR_futex_time64, uaddr, op, val, timeout, uaddr2, val3);
> > +
> > + if (ret == 0 || errno != ENOSYS)
> > + return ret;
> > + }
> > +#endif
> > +
> > +#if defined(__NR_futex)
> > + if (sizeof(*timeout) == sizeof(struct __kernel_old_timespec))
> > + return syscall(__NR_futex, uaddr, op, val, timeout, uaddr2, val3);
> > +
> > + if (timeout && timeout->tv_sec == (long)timeout->tv_sec) {
>
> Not sure if we actually need to reject timeouts that overflow, should probably
> check what glibc does for this case. Intuitively, I'd go with setting
> tv_sec=LONG_MAX.

gblic generally rejects these with EOVERFLOW, see:

https://github.com/bminor/glibc/blob/595c22ecd8e87a27fd19270ed30fdbae9ad25426/sysdeps/unix/sysv/linux/clock_settime.c#L46
https://github.com/bminor/glibc/blob/595c22ecd8e87a27fd19270ed30fdbae9ad25426/sysdeps/unix/sysv/linux/clock_adjtime.c#L40

>
> > + struct __kernel_old_timespec ts32;
> > +
> > + ts32.tv_sec = (__kernel_long_t) timeout->tv_sec;
> > + ts32.tv_nsec = (__kernel_long_t) timeout->tv_nsec;
> > +
> > + return syscall(__NR_futex, uaddr, op, val, ts32, uaddr2, val3);
>
> You need to pass 'ts32' by reference here.

Thanks! Fixed.

Alistair

>
> Arnd