[PATCH v5 00/11] Add futex2 syscalls

From: André Almeida
Date: Thu Jul 08 2021 - 20:13:57 EST


This patchset is an implementation of futex2 interface on top of existing
futex.c code.

* What happened to the current futex()?

The futex() is implemented using a multiplexed interface that doesn't
scale well and gives headaches to people. We don't want to add more
features there.

* New features at futex2()

** NUMA-awareness

At the current implementation, all futex kernel side infrastructure is
stored on a single node. Given that, all futex() calls issued by
processors that aren't located on that node will have a memory access
penalty when doing it.

** Variable sized futexes

Futexes are used to implement atomic operations in userspace.
Supporting 8, 16, 32 and 64 bit sized futexes allows user libraries to
implement all those sizes in a performant way. Thanks Boost devs for
feedback: https://lists.boost.org/Archives/boost/2021/05/251508.php

Embedded systems or anything with memory constrains could benefit of
using smaller sizes for the futex userspace integer.

** Wait on multiple futexes

Proton's (a set of compatibility tools to run Windows games) fork of Wine
benefits of this feature to implement WaitForMultipleObjects from Win32 in
a performant way. Native game engines will benefit from this as well,
given that this is a common wait pattern for games.

* The interface

The new interface has one syscall per operation as opposed to the
current multiplexing one. The details can be found in the following
patches, but this is a high level summary of what the interface can do:

- Supports wake/wait semantics, as in futex()
- Supports requeue operations, similarly as FUTEX_CMP_REQUEUE, but with
individual flags for each address
- Supports waiting for a vector of futexes, using a new syscall named
futex_waitv()

- The following features will be implemented in next patchset versions:
- Supports variable sized futexes (8bits, 16bits, 32bits and 64bits)
- Supports NUMA-awareness operations, where the user can specify on
which memory node would like to operate

* The patchset

Given that futex2 reuses futex code, the patches make futex.c functions
public and modify them as needed.

This patchset can be also found at my git tree:

https://gitlab.collabora.com/tonyk/linux/-/tree/futex2-dev

- Patch 1: Implements 32bit wait/wake

- Patches 2-3: Implement waitv and requeue.

- Patch 4: Add a documentation file which details the interface and
the internal implementation.

- Patches 5-10: Selftests for all operations along with perf
support for futex2.

- Patch 11: Proof of concept of waking threads at waitpid(), not to be
merged as it is.

* Testing

** Stability

- glibc[1]: nptl's low level locking was modified to use futex2 API
(except for PI). All nptl/ tests passed.

- Proton's Wine: Proton/Wine was modified in order to use futex2() for the
emulation of Windows NT sync mechanisms based on futex, called "fsync".
Triple-A games with huge CPU's loads and tons of parallel jobs worked
as expected when compared with the previous FUTEX_WAIT_MULTIPLE
implementation at futex(). Some games issue 42k futex2() calls
per second.

- perf: The perf benchmarks tests can also be used to stress the
interface, and they can be found in this patchset.

[1] https://gitlab.collabora.com/tonyk/glibc/-/tree/futex2-dev

** Performance

- Using perf, no significant difference was measured when comparing
futex() and futex2() for the following benchmarks: hash, wake and
wake-parallel.

- I measured a 15% overhead for the perf's requeue benchmark, comparing
futex2() to futex(). Requeue patch provides more details about why this
happens and how to overcome this.

* Changelog

Changes from v4:
- Use existing futex.c code when possible
- Cleaned up cover letter, check v4 for a more verbose version
v4: https://lore.kernel.org/lkml/20210603195924.361327-1-andrealmeid@xxxxxxxxxxxxx/

André Almeida (11):
futex2: Implement wait and wake functions
futex2: Implement vectorized wait
futex2: Implement requeue operation
docs: locking: futex2: Add documentation
selftests: futex2: Add wake/wait test
selftests: futex2: Add timeout test
selftests: futex2: Add wouldblock test
selftests: futex2: Add waitv test
selftests: futex2: Add requeue test
perf bench: Add futex2 benchmark tests
kernel: Enable waitpid() for futex2

Documentation/locking/futex2.rst | 185 ++++++
Documentation/locking/index.rst | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 4 +
arch/x86/entry/syscalls/syscall_64.tbl | 4 +
include/linux/compat.h | 23 +
include/linux/futex.h | 103 ++++
include/linux/syscalls.h | 8 +
include/uapi/asm-generic/unistd.h | 11 +-
include/uapi/linux/futex.h | 27 +
init/Kconfig | 7 +
kernel/Makefile | 1 +
kernel/fork.c | 2 +
kernel/futex.c | 111 +---
kernel/futex2.c | 566 ++++++++++++++++++
kernel/sys_ni.c | 9 +
tools/arch/x86/include/asm/unistd_64.h | 12 +
tools/perf/bench/bench.h | 4 +
tools/perf/bench/futex-hash.c | 24 +-
tools/perf/bench/futex-requeue.c | 57 +-
tools/perf/bench/futex-wake-parallel.c | 41 +-
tools/perf/bench/futex-wake.c | 37 +-
tools/perf/bench/futex.h | 47 ++
tools/perf/builtin-bench.c | 18 +-
.../selftests/futex/functional/.gitignore | 3 +
.../selftests/futex/functional/Makefile | 6 +-
.../futex/functional/futex2_requeue.c | 164 +++++
.../selftests/futex/functional/futex2_wait.c | 195 ++++++
.../selftests/futex/functional/futex2_waitv.c | 154 +++++
.../futex/functional/futex_wait_timeout.c | 24 +-
.../futex/functional/futex_wait_wouldblock.c | 33 +-
.../testing/selftests/futex/functional/run.sh | 6 +
.../selftests/futex/include/futex2test.h | 112 ++++
32 files changed, 1865 insertions(+), 134 deletions(-)
create mode 100644 Documentation/locking/futex2.rst
create mode 100644 kernel/futex2.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_requeue.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_wait.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_waitv.c
create mode 100644 tools/testing/selftests/futex/include/futex2test.h

--
2.32.0