Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys

From: Mathieu Desnoyers
Date: Mon Mar 10 2025 - 11:41:56 EST


On 2025-03-10 10:43, Dmitry Vyukov wrote:
On Mon, 10 Mar 2025 at 15:38, Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:

On 2025-03-10 10:36, Dmitry Vyukov wrote:
On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:

On 2025-02-27 09:03, Dmitry Vyukov wrote:
Add a test that ensures that PKEY-protected struct rseq_cs
works and does not lead to process kills.

Signed-off-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
Cc: Boqun Feng <boqun.feng@xxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Cc: Aruna Ramakrishna <aruna.ramakrishna@xxxxxxxxxx>
Cc: x86@xxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
Acked-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")

---
Changes in v5:
- Use static for variables/functions
- Use RSEQ_READ/WRITE_ONCE instead of volatile

Changes in v4:
- Added Fixes tag

Changes in v3:
- added Acked-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
- rework the test to work when only pkey 0 is supported for rseq

Changes in v2:
- change test to install protected rseq_cs instead of rseq
---
tools/testing/selftests/rseq/Makefile | 2 +-
tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
tools/testing/selftests/rseq/rseq.h | 1 +
3 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index 5a3432fceb586..9111d25fea3af 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1

TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
param_test_benchmark param_test_compare_twice param_test_mm_cid \
- param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
+ param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test

TEST_GEN_PROGS_EXTENDED = librseq.so

diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
new file mode 100644
index 0000000000000..cc4dd98190942
--- /dev/null
+++ b/tools/testing/selftests/rseq/pkey_test.c
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: LGPL-2.1
+/*
+ * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
+ */
+
+#define _GNU_SOURCE
+#include <err.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <ucontext.h>
+#include <unistd.h>
+
+#include "rseq.h"
+#include "rseq-abi.h"
+
+static int pkey;
+static ucontext_t ucp0, ucp1;
+
+static void coroutine(void)
+{
+ int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
+ /*
+ * When we disable access to pkey 0, globals and TLS become
+ * inaccessible too, so we need to tread carefully.
+ * Pkey is global so we need to copy it onto the stack.
+ */
+ int pk = RSEQ_READ_ONCE(pkey);
+ struct timespec ts;
+
+ orig_pk0 = pkey_get(0);
+ if (pkey_set(0, PKEY_DISABLE_ACCESS))
+ err(1, "pkey_set failed");
+ old_pk0 = pkey_get(0);
+ old_pk1 = pkey_get(pk);
+
+ /*
+ * Prevent compiler from initializing it by loading a 16-global.
+ */
+ RSEQ_WRITE_ONCE(ts.tv_sec, 0);
+ RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
+ /*
+ * If the kernel misbehaves, context switches in the following loop
+ * will terminate the process with SIGSEGV.
+ * Trigger preemption w/o accessing TLS.
+ * Note that glibc's usleep touches errno always.
+ */
+ for (i = 0; i < 10; i++)
+ syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
+
+ pk0 = pkey_get(0);
+ pk1 = pkey_get(pk);
+ if (pkey_set(0, orig_pk0))
+ err(1, "pkey_set failed");
+
+ /*
+ * Ensure that the kernel has restored the previous value of pkeys
+ * register after changing them.
+ */
+ if (old_pk0 != pk0)
+ errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
+ if (old_pk1 != pk1)
+ errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
+
+ swapcontext(&ucp1, &ucp0);
+ abort();
+}
+
+int main(int argc, char **argv)
+{
+ pkey = pkey_alloc(0, 0);
+ if (pkey == -1) {
+ printf("[SKIP]\tKernel does not support PKEYs: %s\n",
+ strerror(errno));
+ return 0;
+ }
+
+ if (rseq_register_current_thread())
+ err(1, "rseq_register_current_thread failed");
+
+ if (getcontext(&ucp1))
+ err(1, "getcontext failed");
+ ucp1.uc_stack.ss_size = getpagesize() * 4;
+ ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
+ PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
+ if (ucp1.uc_stack.ss_sp == MAP_FAILED)
+ err(1, "mmap failed");
+ if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
+ PROT_READ | PROT_WRITE, pkey))
+ err(1, "pkey_mprotect failed");
+ makecontext(&ucp1, coroutine, 0);
+ if (swapcontext(&ucp0, &ucp1))
+ err(1, "swapcontext failed");

Should the rseq register be paired with a rseq unregister here ?

I dunno. It's necessary for this test and in general. Tcmalloc won't
unregister on thread exit. Even for a libc it may be a bad idea due to
signals.

The rseq register/unregister is only for the case where libc does not
support rseq. But if you want to use rseq_register_current_thread(),
then you'll want to pair it with unregister.

Why? Isn't it better to have tests more realitic?

If you use rseq.c rseq_register_current_thread without
rseq_unregister_current_thread, then you rely on implicit
unregistration done by the kernel at thread exit.

Then you need to ensure your userspace runtime keep the TLS
that holds the rseq area allocated for the entire execution
of the thread until it exits in the kernel. Note that
disabling signals is not sufficient to prevent the kernel
from accessing the rseq area.

GNU libc gets away with automatic unregistration at
thread exit because it completely controls the lifetime
of the registered rseq area, keeping it allocated for as
long as the thread is executing.

So in order to minimize the dependency on specific libc
behavior in the kernel sefltests, the selftests rseq.h
requires explicit registration *and* unregistration.

Thanks,

Mathieu




Thanks,

Mathieu


Thanks,

Mathieu

+ return 0;
+}
diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
index ba424ce80a719..65da4a727c550 100644
--- a/tools/testing/selftests/rseq/rseq.h
+++ b/tools/testing/selftests/rseq/rseq.h
@@ -8,6 +8,7 @@
#ifndef RSEQ_H
#define RSEQ_H

+#include <assert.h>
#include <stdint.h>
#include <stdbool.h>
#include <pthread.h>


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com