Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys

From: Dmitry Vyukov
Date: Mon Mar 10 2025 - 12:31:39 EST


On Mon, 10 Mar 2025 at 16:41, Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>
> On 2025-03-10 10:43, Dmitry Vyukov wrote:
> > On Mon, 10 Mar 2025 at 15:38, Mathieu Desnoyers
> > <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
> >>
> >> On 2025-03-10 10:36, Dmitry Vyukov wrote:
> >>> On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
> >>> <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
> >>>>
> >>>> On 2025-02-27 09:03, Dmitry Vyukov wrote:
> >>>>> Add a test that ensures that PKEY-protected struct rseq_cs
> >>>>> works and does not lead to process kills.
> >>>>>
> >>>>> Signed-off-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> >>>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
> >>>>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> >>>>> Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
> >>>>> Cc: Boqun Feng <boqun.feng@xxxxxxxxx>
> >>>>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> >>>>> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> >>>>> Cc: Borislav Petkov <bp@xxxxxxxxx>
> >>>>> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> >>>>> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> >>>>> Cc: Aruna Ramakrishna <aruna.ramakrishna@xxxxxxxxxx>
> >>>>> Cc: x86@xxxxxxxxxx
> >>>>> Cc: linux-kernel@xxxxxxxxxxxxxxx
> >>>>> Acked-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> >>>>> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
> >>>>>
> >>>>> ---
> >>>>> Changes in v5:
> >>>>> - Use static for variables/functions
> >>>>> - Use RSEQ_READ/WRITE_ONCE instead of volatile
> >>>>>
> >>>>> Changes in v4:
> >>>>> - Added Fixes tag
> >>>>>
> >>>>> Changes in v3:
> >>>>> - added Acked-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> >>>>> - rework the test to work when only pkey 0 is supported for rseq
> >>>>>
> >>>>> Changes in v2:
> >>>>> - change test to install protected rseq_cs instead of rseq
> >>>>> ---
> >>>>> tools/testing/selftests/rseq/Makefile | 2 +-
> >>>>> tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
> >>>>> tools/testing/selftests/rseq/rseq.h | 1 +
> >>>>> 3 files changed, 100 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
> >>>>> index 5a3432fceb586..9111d25fea3af 100644
> >>>>> --- a/tools/testing/selftests/rseq/Makefile
> >>>>> +++ b/tools/testing/selftests/rseq/Makefile
> >>>>> @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
> >>>>>
> >>>>> TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
> >>>>> param_test_benchmark param_test_compare_twice param_test_mm_cid \
> >>>>> - param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
> >>>>> + param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
> >>>>>
> >>>>> TEST_GEN_PROGS_EXTENDED = librseq.so
> >>>>>
> >>>>> diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
> >>>>> new file mode 100644
> >>>>> index 0000000000000..cc4dd98190942
> >>>>> --- /dev/null
> >>>>> +++ b/tools/testing/selftests/rseq/pkey_test.c
> >>>>> @@ -0,0 +1,98 @@
> >>>>> +// SPDX-License-Identifier: LGPL-2.1
> >>>>> +/*
> >>>>> + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
> >>>>> + */
> >>>>> +
> >>>>> +#define _GNU_SOURCE
> >>>>> +#include <err.h>
> >>>>> +#include <errno.h>
> >>>>> +#include <stdio.h>
> >>>>> +#include <stdlib.h>
> >>>>> +#include <string.h>
> >>>>> +#include <sys/mman.h>
> >>>>> +#include <sys/syscall.h>
> >>>>> +#include <ucontext.h>
> >>>>> +#include <unistd.h>
> >>>>> +
> >>>>> +#include "rseq.h"
> >>>>> +#include "rseq-abi.h"
> >>>>> +
> >>>>> +static int pkey;
> >>>>> +static ucontext_t ucp0, ucp1;
> >>>>> +
> >>>>> +static void coroutine(void)
> >>>>> +{
> >>>>> + int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
> >>>>> + /*
> >>>>> + * When we disable access to pkey 0, globals and TLS become
> >>>>> + * inaccessible too, so we need to tread carefully.
> >>>>> + * Pkey is global so we need to copy it onto the stack.
> >>>>> + */
> >>>>> + int pk = RSEQ_READ_ONCE(pkey);
> >>>>> + struct timespec ts;
> >>>>> +
> >>>>> + orig_pk0 = pkey_get(0);
> >>>>> + if (pkey_set(0, PKEY_DISABLE_ACCESS))
> >>>>> + err(1, "pkey_set failed");
> >>>>> + old_pk0 = pkey_get(0);
> >>>>> + old_pk1 = pkey_get(pk);
> >>>>> +
> >>>>> + /*
> >>>>> + * Prevent compiler from initializing it by loading a 16-global.
> >>>>> + */
> >>>>> + RSEQ_WRITE_ONCE(ts.tv_sec, 0);
> >>>>> + RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
> >>>>> + /*
> >>>>> + * If the kernel misbehaves, context switches in the following loop
> >>>>> + * will terminate the process with SIGSEGV.
> >>>>> + * Trigger preemption w/o accessing TLS.
> >>>>> + * Note that glibc's usleep touches errno always.
> >>>>> + */
> >>>>> + for (i = 0; i < 10; i++)
> >>>>> + syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
> >>>>> +
> >>>>> + pk0 = pkey_get(0);
> >>>>> + pk1 = pkey_get(pk);
> >>>>> + if (pkey_set(0, orig_pk0))
> >>>>> + err(1, "pkey_set failed");
> >>>>> +
> >>>>> + /*
> >>>>> + * Ensure that the kernel has restored the previous value of pkeys
> >>>>> + * register after changing them.
> >>>>> + */
> >>>>> + if (old_pk0 != pk0)
> >>>>> + errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
> >>>>> + if (old_pk1 != pk1)
> >>>>> + errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
> >>>>> +
> >>>>> + swapcontext(&ucp1, &ucp0);
> >>>>> + abort();
> >>>>> +}
> >>>>> +
> >>>>> +int main(int argc, char **argv)
> >>>>> +{
> >>>>> + pkey = pkey_alloc(0, 0);
> >>>>> + if (pkey == -1) {
> >>>>> + printf("[SKIP]\tKernel does not support PKEYs: %s\n",
> >>>>> + strerror(errno));
> >>>>> + return 0;
> >>>>> + }
> >>>>> +
> >>>>> + if (rseq_register_current_thread())
> >>>>> + err(1, "rseq_register_current_thread failed");
> >>>>> +
> >>>>> + if (getcontext(&ucp1))
> >>>>> + err(1, "getcontext failed");
> >>>>> + ucp1.uc_stack.ss_size = getpagesize() * 4;
> >>>>> + ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
> >>>>> + PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
> >>>>> + if (ucp1.uc_stack.ss_sp == MAP_FAILED)
> >>>>> + err(1, "mmap failed");
> >>>>> + if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
> >>>>> + PROT_READ | PROT_WRITE, pkey))
> >>>>> + err(1, "pkey_mprotect failed");
> >>>>> + makecontext(&ucp1, coroutine, 0);
> >>>>> + if (swapcontext(&ucp0, &ucp1))
> >>>>> + err(1, "swapcontext failed");
> >>>>
> >>>> Should the rseq register be paired with a rseq unregister here ?
> >>>
> >>> I dunno. It's necessary for this test and in general. Tcmalloc won't
> >>> unregister on thread exit. Even for a libc it may be a bad idea due to
> >>> signals.
> >>
> >> The rseq register/unregister is only for the case where libc does not
> >> support rseq. But if you want to use rseq_register_current_thread(),
> >> then you'll want to pair it with unregister.
> >
> > Why? Isn't it better to have tests more realitic?
>
> If you use rseq.c rseq_register_current_thread without
> rseq_unregister_current_thread, then you rely on implicit
> unregistration done by the kernel at thread exit.
>
> Then you need to ensure your userspace runtime keep the TLS
> that holds the rseq area allocated for the entire execution
> of the thread until it exits in the kernel. Note that
> disabling signals is not sufficient to prevent the kernel
> from accessing the rseq area.
>
> GNU libc gets away with automatic unregistration at
> thread exit because it completely controls the lifetime
> of the registered rseq area, keeping it allocated for as
> long as the thread is executing.
>
> So in order to minimize the dependency on specific libc
> behavior in the kernel sefltests, the selftests rseq.h
> requires explicit registration *and* unregistration.

If libc registers rseq (!rseq_ownership), then
rseq_unregister_current_thread is a no-op. And if libc has not
registered rseq, then we are not relying on any libc behavior. I don't
see how calling rseq_unregister_current_thread helps to rely less on
libc. Am I missing something?




> Thanks,
>
> Mathieu
>
>
> >
> >
> >> Thanks,
> >>
> >> Mathieu
> >>
> >>>
> >>>> Thanks,
> >>>>
> >>>> Mathieu
> >>>>
> >>>>> + return 0;
> >>>>> +}
> >>>>> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
> >>>>> index ba424ce80a719..65da4a727c550 100644
> >>>>> --- a/tools/testing/selftests/rseq/rseq.h
> >>>>> +++ b/tools/testing/selftests/rseq/rseq.h
> >>>>> @@ -8,6 +8,7 @@
> >>>>> #ifndef RSEQ_H
> >>>>> #define RSEQ_H
> >>>>>
> >>>>> +#include <assert.h>
> >>>>> #include <stdint.h>
> >>>>> #include <stdbool.h>
> >>>>> #include <pthread.h>
> >>>>
> >>>>
> >>>> --
> >>>> Mathieu Desnoyers
> >>>> EfficiOS Inc.
> >>>> https://www.efficios.com
> >>
> >>
> >> --
> >> Mathieu Desnoyers
> >> EfficiOS Inc.
> >> https://www.efficios.com
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com