Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys

From: Dmitry Vyukov
Date: Mon Mar 10 2025 - 10:43:37 EST


On Mon, 10 Mar 2025 at 15:38, Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>
> On 2025-03-10 10:36, Dmitry Vyukov wrote:
> > On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
> > <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
> >>
> >> On 2025-02-27 09:03, Dmitry Vyukov wrote:
> >>> Add a test that ensures that PKEY-protected struct rseq_cs
> >>> works and does not lead to process kills.
> >>>
> >>> Signed-off-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> >>> Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
> >>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> >>> Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
> >>> Cc: Boqun Feng <boqun.feng@xxxxxxxxx>
> >>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> >>> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> >>> Cc: Borislav Petkov <bp@xxxxxxxxx>
> >>> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> >>> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> >>> Cc: Aruna Ramakrishna <aruna.ramakrishna@xxxxxxxxxx>
> >>> Cc: x86@xxxxxxxxxx
> >>> Cc: linux-kernel@xxxxxxxxxxxxxxx
> >>> Acked-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> >>> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
> >>>
> >>> ---
> >>> Changes in v5:
> >>> - Use static for variables/functions
> >>> - Use RSEQ_READ/WRITE_ONCE instead of volatile
> >>>
> >>> Changes in v4:
> >>> - Added Fixes tag
> >>>
> >>> Changes in v3:
> >>> - added Acked-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> >>> - rework the test to work when only pkey 0 is supported for rseq
> >>>
> >>> Changes in v2:
> >>> - change test to install protected rseq_cs instead of rseq
> >>> ---
> >>> tools/testing/selftests/rseq/Makefile | 2 +-
> >>> tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
> >>> tools/testing/selftests/rseq/rseq.h | 1 +
> >>> 3 files changed, 100 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
> >>> index 5a3432fceb586..9111d25fea3af 100644
> >>> --- a/tools/testing/selftests/rseq/Makefile
> >>> +++ b/tools/testing/selftests/rseq/Makefile
> >>> @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
> >>>
> >>> TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
> >>> param_test_benchmark param_test_compare_twice param_test_mm_cid \
> >>> - param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
> >>> + param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
> >>>
> >>> TEST_GEN_PROGS_EXTENDED = librseq.so
> >>>
> >>> diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
> >>> new file mode 100644
> >>> index 0000000000000..cc4dd98190942
> >>> --- /dev/null
> >>> +++ b/tools/testing/selftests/rseq/pkey_test.c
> >>> @@ -0,0 +1,98 @@
> >>> +// SPDX-License-Identifier: LGPL-2.1
> >>> +/*
> >>> + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
> >>> + */
> >>> +
> >>> +#define _GNU_SOURCE
> >>> +#include <err.h>
> >>> +#include <errno.h>
> >>> +#include <stdio.h>
> >>> +#include <stdlib.h>
> >>> +#include <string.h>
> >>> +#include <sys/mman.h>
> >>> +#include <sys/syscall.h>
> >>> +#include <ucontext.h>
> >>> +#include <unistd.h>
> >>> +
> >>> +#include "rseq.h"
> >>> +#include "rseq-abi.h"
> >>> +
> >>> +static int pkey;
> >>> +static ucontext_t ucp0, ucp1;
> >>> +
> >>> +static void coroutine(void)
> >>> +{
> >>> + int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
> >>> + /*
> >>> + * When we disable access to pkey 0, globals and TLS become
> >>> + * inaccessible too, so we need to tread carefully.
> >>> + * Pkey is global so we need to copy it onto the stack.
> >>> + */
> >>> + int pk = RSEQ_READ_ONCE(pkey);
> >>> + struct timespec ts;
> >>> +
> >>> + orig_pk0 = pkey_get(0);
> >>> + if (pkey_set(0, PKEY_DISABLE_ACCESS))
> >>> + err(1, "pkey_set failed");
> >>> + old_pk0 = pkey_get(0);
> >>> + old_pk1 = pkey_get(pk);
> >>> +
> >>> + /*
> >>> + * Prevent compiler from initializing it by loading a 16-global.
> >>> + */
> >>> + RSEQ_WRITE_ONCE(ts.tv_sec, 0);
> >>> + RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
> >>> + /*
> >>> + * If the kernel misbehaves, context switches in the following loop
> >>> + * will terminate the process with SIGSEGV.
> >>> + * Trigger preemption w/o accessing TLS.
> >>> + * Note that glibc's usleep touches errno always.
> >>> + */
> >>> + for (i = 0; i < 10; i++)
> >>> + syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
> >>> +
> >>> + pk0 = pkey_get(0);
> >>> + pk1 = pkey_get(pk);
> >>> + if (pkey_set(0, orig_pk0))
> >>> + err(1, "pkey_set failed");
> >>> +
> >>> + /*
> >>> + * Ensure that the kernel has restored the previous value of pkeys
> >>> + * register after changing them.
> >>> + */
> >>> + if (old_pk0 != pk0)
> >>> + errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
> >>> + if (old_pk1 != pk1)
> >>> + errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
> >>> +
> >>> + swapcontext(&ucp1, &ucp0);
> >>> + abort();
> >>> +}
> >>> +
> >>> +int main(int argc, char **argv)
> >>> +{
> >>> + pkey = pkey_alloc(0, 0);
> >>> + if (pkey == -1) {
> >>> + printf("[SKIP]\tKernel does not support PKEYs: %s\n",
> >>> + strerror(errno));
> >>> + return 0;
> >>> + }
> >>> +
> >>> + if (rseq_register_current_thread())
> >>> + err(1, "rseq_register_current_thread failed");
> >>> +
> >>> + if (getcontext(&ucp1))
> >>> + err(1, "getcontext failed");
> >>> + ucp1.uc_stack.ss_size = getpagesize() * 4;
> >>> + ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
> >>> + PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
> >>> + if (ucp1.uc_stack.ss_sp == MAP_FAILED)
> >>> + err(1, "mmap failed");
> >>> + if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
> >>> + PROT_READ | PROT_WRITE, pkey))
> >>> + err(1, "pkey_mprotect failed");
> >>> + makecontext(&ucp1, coroutine, 0);
> >>> + if (swapcontext(&ucp0, &ucp1))
> >>> + err(1, "swapcontext failed");
> >>
> >> Should the rseq register be paired with a rseq unregister here ?
> >
> > I dunno. It's necessary for this test and in general. Tcmalloc won't
> > unregister on thread exit. Even for a libc it may be a bad idea due to
> > signals.
>
> The rseq register/unregister is only for the case where libc does not
> support rseq. But if you want to use rseq_register_current_thread(),
> then you'll want to pair it with unregister.

Why? Isn't it better to have tests more realitic?


> Thanks,
>
> Mathieu
>
> >
> >> Thanks,
> >>
> >> Mathieu
> >>
> >>> + return 0;
> >>> +}
> >>> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
> >>> index ba424ce80a719..65da4a727c550 100644
> >>> --- a/tools/testing/selftests/rseq/rseq.h
> >>> +++ b/tools/testing/selftests/rseq/rseq.h
> >>> @@ -8,6 +8,7 @@
> >>> #ifndef RSEQ_H
> >>> #define RSEQ_H
> >>>
> >>> +#include <assert.h>
> >>> #include <stdint.h>
> >>> #include <stdbool.h>
> >>> #include <pthread.h>
> >>
> >>
> >> --
> >> Mathieu Desnoyers
> >> EfficiOS Inc.
> >> https://www.efficios.com
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com