Re: [PATCH] futex: send SIGBUS if argument is not aligned on a four-byte boundary

From: Carlos O'Donell
Date: Fri May 15 2020 - 13:50:26 EST


On 5/15/20 12:27 PM, Peter Zijlstra wrote:
> On Fri, May 15, 2020 at 06:36:47PM +0300, Konstantin Khlebnikov wrote:
>> Userspace implementations of mutexes (including glibc) in some cases
>> retries operation without checking error code from syscall futex.
>> This is good for performance because most errors are impossible when
>> locking code trusts itself.

In newer versions of glibc, which won't solve this problem for older
distributions (or newer glibc on older kernels), we've refactored all
of this code into futex-internal.h and do things like this (example
from one of the generic internal interfaces for futex use):

149 case -ETIMEDOUT: /* Cannot have happened as we provided no timeout. */
150 case -EFAULT: /* Must have been caused by a glibc or application bug. */
151 case -EINVAL: /* Either due to wrong alignment or due to the timeout not
152 being normalized. Must have been caused by a glibc or
153 application bug. */
154 case -ENOSYS: /* Must have been caused by a glibc bug. */
155 /* No other errors are documented at this time. */
156 default:
157 futex_fatal_error ();
158 }

Several of the pthread interfaces are using this code so they won't suffer
from "stuck EINVAL loops" like below.

We worked with all the interested parties to get `man 2 futex` updated
with the expected semantics and error return codes.

We don't want to ignore what the kernel is returning and we terminate
the process without propagating that error upwards for the simple
API cases.

Likewise note the "default:" which means if we get new futex error that
is not documented we also terminate the process.

>> Some errors which could came from outer code are handled automatically,
>> for example invalid address triggers SIGSEGV on atomic fast path.
>>
>> But one case turns into nasty busy-loop: when address is unaligned.
>> futex(FUTEX_WAIT) returns EINVAL immediately and loop goes to retry.
>>
>> Example which loops inside second call rather than hung peacefully:
>>
>> #include <stdlib.h>
>> #include <pthread.h>
>>
>> int main(int argc, char **argv)
>> {
>> char buf[sizeof(pthread_mutex_t) + 1];
>> pthread_mutex_t *mutex = (pthread_mutex_t *)(buf + 1);
>>
>> pthread_mutex_init(mutex, NULL);
>> pthread_mutex_lock(mutex);
>> pthread_mutex_lock(mutex);
>> }

This isn't fixed because this is the older code in pthread_mutex_lock
which we haven't ported to futex-internal.h yet, otherwise we would abort
the process.

A quick change to use the newer interface (futex_wait_simple), and the
example above fails as expected:

./test
The futex facility returned an unexpected error code.
Aborted (core dumped)

And it does not loop. I'm open to bikeshed on the existing error message
(which has been there since 2014 / commit a2f0363f817).

coredumpctl debug loop-futex/test

Core was generated by `./test'.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
49 return ret;
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1 0x00007f1cac0d2872 in __GI_abort () at abort.c:79
#2 0x00007f1cac12a248 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f1cac234a57 "%s")
at ../sysdeps/posix/libc_fatal.c:155
#3 0x00007f1cac12a27a in __GI___libc_fatal (
message=message@entry=0x7f1cac288000 "The futex facility returned an unexpected error code.\n")
at ../sysdeps/posix/libc_fatal.c:164
#4 0x00007f1cac283fdc in futex_fatal_error () at ../sysdeps/nptl/futex-internal.h:157
#5 futex_wait (private=<optimized out>, expected=2, futex_word=0x7f1cac283fdc <__lll_lock_wait+92>)
at ../sysdeps/nptl/futex-internal.h:157
#6 futex_wait_simple (private=<optimized out>, expected=2, futex_word=0x7f1cac283fdc <__lll_lock_wait+92>)
at ../sysdeps/nptl/futex-internal.h:172
#7 __lll_lock_wait (futex=futex@entry=0x7ffdb1f0a2c1, private=<optimized out>) at lowlevellock.c:53
#8 0x00007f1cac27cbf3 in __GI___pthread_mutex_lock (mutex=0x7ffdb1f0a2c1) at ../nptl/pthread_mutex_lock.c:80
#9 0x000000000040117a in main (argc=1, argv=0x7ffdb1f0a3f8) at test.c:11

So semantically the kernel change makes sense, and will terminate the
process for glibc today, and matches what the refactored glibc code
will do in userspace for more of the interfaces in the future.

>> It seems there is no practical usage for calling syscall futex for
>> unaligned address. This may be only bug in user space. Let's help
>> and handle this gracefully without adding extra code on fast path.

The only use case I could see is retroactively adding a futex to the
existing ABI of a structure and wanting to avoid padding. That does
not seem like a common enough use case that we would want to support.
To get efficient cache-line usage you'll want to pack things by hand.

>> This patch sends SIGBUS signal to slay task and break busy-loop.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx>
>> Reported-by: Maxim Samoylov <max7255@xxxxxxxxxxxxxx>
>
> Seems like a sensible idea to me.

Please do try to update the linux kernel man pages update to note
the change in behaviour and the version and commit of the released
kernel where this changed.

Please keep `man 2 futex` as accurate as possible for userspace
libc implementations.

Thanks.

> Acked-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
>
>> ---
>> kernel/futex.c | 13 ++++++++++++-
>> 1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/futex.c b/kernel/futex.c
>> index b59532862bc0..8a6d35fa56bc 100644
>> --- a/kernel/futex.c
>> +++ b/kernel/futex.c
>> @@ -508,10 +508,21 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a
>>
>> /*
>> * The futex address must be "naturally" aligned.
>> + * Also send signal to break busy-loop if user-space ignore error.
>> + * EFAULT case should trigger SIGSEGV at access from user-space.
>> */
>> key->both.offset = address % PAGE_SIZE;
>> - if (unlikely((address % sizeof(u32)) != 0))
>> + if (unlikely((address % sizeof(u32)) != 0)) {
>> + struct kernel_siginfo info;
>> +
>> + clear_siginfo(&info);
>> + info.si_signo = SIGBUS;
>> + info.si_code = BUS_ADRALN;
>> + info.si_addr = uaddr;
>> + force_sig_info(&info);
>> +
>> return -EINVAL;
>> + }
>> address -= key->both.offset;
>>
>> if (unlikely(!access_ok(uaddr, sizeof(u32))))
>>
>

--
Cheers,
Carlos.