On Mon, 10 Mar 2025 at 16:41, Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:
On 2025-03-10 10:43, Dmitry Vyukov wrote:
On Mon, 10 Mar 2025 at 15:38, Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:
On 2025-03-10 10:36, Dmitry Vyukov wrote:
On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:
On 2025-02-27 09:03, Dmitry Vyukov wrote:
Add a test that ensures that PKEY-protected struct rseq_cs
works and does not lead to process kills.
Signed-off-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
Cc: Boqun Feng <boqun.feng@xxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Cc: Aruna Ramakrishna <aruna.ramakrishna@xxxxxxxxxx>
Cc: x86@xxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
Acked-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
---
Changes in v5:
- Use static for variables/functions
- Use RSEQ_READ/WRITE_ONCE instead of volatile
Changes in v4:
- Added Fixes tag
Changes in v3:
- added Acked-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
- rework the test to work when only pkey 0 is supported for rseq
Changes in v2:
- change test to install protected rseq_cs instead of rseq
---
tools/testing/selftests/rseq/Makefile | 2 +-
tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
tools/testing/selftests/rseq/rseq.h | 1 +
3 files changed, 100 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index 5a3432fceb586..9111d25fea3af 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
param_test_benchmark param_test_compare_twice param_test_mm_cid \
- param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
+ param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
TEST_GEN_PROGS_EXTENDED = librseq.so
diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
new file mode 100644
index 0000000000000..cc4dd98190942
--- /dev/null
+++ b/tools/testing/selftests/rseq/pkey_test.c
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: LGPL-2.1
+/*
+ * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
+ */
+
+#define _GNU_SOURCE
+#include <err.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <ucontext.h>
+#include <unistd.h>
+
+#include "rseq.h"
+#include "rseq-abi.h"
+
+static int pkey;
+static ucontext_t ucp0, ucp1;
+
+static void coroutine(void)
+{
+ int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
+ /*
+ * When we disable access to pkey 0, globals and TLS become
+ * inaccessible too, so we need to tread carefully.
+ * Pkey is global so we need to copy it onto the stack.
+ */
+ int pk = RSEQ_READ_ONCE(pkey);
+ struct timespec ts;
+
+ orig_pk0 = pkey_get(0);
+ if (pkey_set(0, PKEY_DISABLE_ACCESS))
+ err(1, "pkey_set failed");
+ old_pk0 = pkey_get(0);
+ old_pk1 = pkey_get(pk);
+
+ /*
+ * Prevent compiler from initializing it by loading a 16-global.
+ */
+ RSEQ_WRITE_ONCE(ts.tv_sec, 0);
+ RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
+ /*
+ * If the kernel misbehaves, context switches in the following loop
+ * will terminate the process with SIGSEGV.
+ * Trigger preemption w/o accessing TLS.
+ * Note that glibc's usleep touches errno always.
+ */
+ for (i = 0; i < 10; i++)
+ syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
+
+ pk0 = pkey_get(0);
+ pk1 = pkey_get(pk);
+ if (pkey_set(0, orig_pk0))
+ err(1, "pkey_set failed");
+
+ /*
+ * Ensure that the kernel has restored the previous value of pkeys
+ * register after changing them.
+ */
+ if (old_pk0 != pk0)
+ errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
+ if (old_pk1 != pk1)
+ errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
+
+ swapcontext(&ucp1, &ucp0);
+ abort();
+}
+
+int main(int argc, char **argv)
+{
+ pkey = pkey_alloc(0, 0);
+ if (pkey == -1) {
+ printf("[SKIP]\tKernel does not support PKEYs: %s\n",
+ strerror(errno));
+ return 0;
+ }
+
+ if (rseq_register_current_thread())
+ err(1, "rseq_register_current_thread failed");
+
+ if (getcontext(&ucp1))
+ err(1, "getcontext failed");
+ ucp1.uc_stack.ss_size = getpagesize() * 4;
+ ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
+ PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
+ if (ucp1.uc_stack.ss_sp == MAP_FAILED)
+ err(1, "mmap failed");
+ if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
+ PROT_READ | PROT_WRITE, pkey))
+ err(1, "pkey_mprotect failed");
+ makecontext(&ucp1, coroutine, 0);
+ if (swapcontext(&ucp0, &ucp1))
+ err(1, "swapcontext failed");
Should the rseq register be paired with a rseq unregister here ?
I dunno. It's necessary for this test and in general. Tcmalloc won't
unregister on thread exit. Even for a libc it may be a bad idea due to
signals.
The rseq register/unregister is only for the case where libc does not
support rseq. But if you want to use rseq_register_current_thread(),
then you'll want to pair it with unregister.
Why? Isn't it better to have tests more realitic?
If you use rseq.c rseq_register_current_thread without
rseq_unregister_current_thread, then you rely on implicit
unregistration done by the kernel at thread exit.
Then you need to ensure your userspace runtime keep the TLS
that holds the rseq area allocated for the entire execution
of the thread until it exits in the kernel. Note that
disabling signals is not sufficient to prevent the kernel
from accessing the rseq area.
GNU libc gets away with automatic unregistration at
thread exit because it completely controls the lifetime
of the registered rseq area, keeping it allocated for as
long as the thread is executing.
So in order to minimize the dependency on specific libc
behavior in the kernel sefltests, the selftests rseq.h
requires explicit registration *and* unregistration.
If libc registers rseq (!rseq_ownership), then
rseq_unregister_current_thread is a no-op. And if libc has not
registered rseq, then we are not relying on any libc behavior. I don't
see how calling rseq_unregister_current_thread helps to rely less on
libc. Am I missing something?
Thanks,
Mathieu
Thanks,
Mathieu
Thanks,
Mathieu
+ return 0;
+}
diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
index ba424ce80a719..65da4a727c550 100644
--- a/tools/testing/selftests/rseq/rseq.h
+++ b/tools/testing/selftests/rseq/rseq.h
@@ -8,6 +8,7 @@
#ifndef RSEQ_H
#define RSEQ_H
+#include <assert.h>
#include <stdint.h>
#include <stdbool.h>
#include <pthread.h>
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com