[RFC PATCH v1 0/6] sched: NUMA-aware concurrency IDs

From: Mathieu Desnoyers
Date: Fri Aug 23 2024 - 15:00:53 EST


The issue addressed by this series is the non-locality of NUMA accesses
to data structures indexed by concurrency IDs: for example, in a
scenario where a process has two threads, and they periodically run one
after the other on different NUMA nodes, each will be assigned mm_cid=0.
As a consequence, they will end up accessing the same pages, and thus at
least one of the threads will need to perform remote NUMA accesses,
which is inefficient.

Solve this by making the rseq concurrency ID (mm_cid) NUMA-aware. On
NUMA systems, when a NUMA-aware concurrency ID is observed by user-space
to be associated with a NUMA node, guarantee that it never changes NUMA
node unless either a kernel-level NUMA configuration change happens, or
scheduler migrations end up migrating tasks across NUMA nodes.

There is a tradeoff between NUMA locality and compactness of the
concurrency ID allocation. Favor compactness over NUMA locality when
the scheduler migrates tasks across NUMA nodes, as this does not cause
the frequent remote NUMA accesses behavior. This is done by limiting the
concurrency ID range to minimum between the number of threads belonging
to the process and the number of allowed CPUs.

This series applies on top of v6.10.6.

Changes since v0: applied the changes requested by Yury Norov, and added
Reviewed-by tag from Shuah Khan for selftests changes. Rebased on
v6.10.6.

Cc: Valentin Schneider <vschneid@xxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
Cc: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
Cc: Ben Segall <bsegall@xxxxxxxxxx>
Cc: Yury Norov <yury.norov@xxxxxxxxx>
Cc: Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx>
Cc: Shuah Khan <skhan@xxxxxxxxxxxxxxxxxxx>

Mathieu Desnoyers (6):
lib: Clarify comment on top of find_next_andnot_bit
lib: Implement find_{first,next,nth}_nor_bit, find_first_andnot_bit
cpumask: Implement cpumask_{first,next}_{nor,andnot}
sched: NUMA-aware per-memory-map concurrency IDs
selftests/rseq: x86: Implement rseq_load_u32_u32
selftests/rseq: Implement NUMA node id vs mm_cid invariant test

include/linux/cpumask.h | 60 ++++++++
include/linux/find.h | 119 ++++++++++++++-
include/linux/mm_types.h | 57 ++++++-
kernel/sched/core.c | 10 +-
kernel/sched/sched.h | 139 +++++++++++++++--
lib/find_bit.c | 36 +++++
tools/testing/selftests/rseq/.gitignore | 1 +
tools/testing/selftests/rseq/Makefile | 2 +-
.../testing/selftests/rseq/basic_numa_test.c | 144 ++++++++++++++++++
tools/testing/selftests/rseq/rseq-x86-bits.h | 43 ++++++
tools/testing/selftests/rseq/rseq.h | 14 ++
11 files changed, 604 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/rseq/basic_numa_test.c

--
2.39.2