Re: [PATCH v4 3/4] rseq: Make rseq work with protection keys

From: Mathieu Desnoyers
Date: Tue Feb 25 2025 - 09:32:45 EST


On 2025-02-25 09:07, Dmitry Vyukov wrote:
On Mon, 24 Feb 2025 at 20:18, Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:

On 2025-02-24 08:20, Dmitry Vyukov wrote:
If an application registers rseq, and ever switches to another pkey
protection (such that the rseq becomes inaccessible), then any
context switch will cause failure in __rseq_handle_notify_resume()
attempting to read/write struct rseq and/or rseq_cs. Since context
switches are asynchronous and are outside of the application control
(not part of the restricted code scope), temporarily switch to
pkey value that allows access to the 0 (default) PKEY.

This is a good start, but the plan Dave and I discussed went further
than this. Those additions are needed:

1) Add validation at rseq registration that the struct rseq is indeed
pkey-0 memory (return failure if not).

I don't think this is worth it for multiple reasons:
- a program may first register it and then assign a key, which means
we also need to check in pkey_mprotect
- pkey_mprotect may be applied to rseq of another thread, so ensuring
that will require complex code with non-trivial synchronization and
will add considerable overhead to pkey_mprotect call
- a program may assign non-0 pkey but have it always accessible, such
programs will break by the new check
- the misuse is already detected by rseq code, and UNIX errno-based
reporting is not very informative and does not add much value on top
of existing reporting
- this is not different from registering rseq and then unmap'ing the
memory, checking that does not look like a good idea, and checking
only subset of misuses is inconsistent

Based on my experience with rseq, what would be useful is reporting a
meaningful siginfo for access errors (address/unique code) and fixing
signal delivery. That would solve all of the above problems, and
provide useful info for the user (not just confusing EINVAL from
mprotect/munmap).

But I would prefer to not mix these unrelated usability improvements
and bug fixes with this change. That's not related to this change.

I agree with your arguments. If Dave is OK with it, I'd be fine with
leaving out the pkey-0 validation on rseq registration, and eventually
bring meaningful siginfo access errors as future improvements.

So the new behavior would be that both rseq and rseq_cs are required
to be pkey-0. If they are not and their pkey is not accessible in the
current context, it would trigger a segmentation fault. Ideally we'd
want to document this somewhere in the UAPI header.



2) The pkey-0 requirement is only for struct rseq, which we can check
for at rseq registration, and happens to be the fast path. For struct
rseq_cs, this is not the same tradeoff: we cannot easily check its
associated pkey because the rseq_cs pointer is updated by userspace
when entering a critical section. But the good news is that reading
the content of struct rseq_cs is *not* a fast-path: it's only done
when preempting/delivering a signal over a thread which has a
non-NULL rseq_cs pointer.

rseq_cs is usually accessed on a hot path since rseq_cs pointer is not
cleared on critical section exit (at least that's what we do).

Fair point.


Therefore reading the struct rseq_cs content should be done with
write_permissive_pkey_val(), giving access to all pkeys.

You just asked me to redo the code to simplify it, won't this
complicate it back again? ;)

I'm fine with the pkey-0 approach for both rseq and rseq_cs if Dave is
also OK with it.

Thanks,

Mathieu



Thanks,

Mathieu


Signed-off-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
Cc: Boqun Feng <boqun.feng@xxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Cc: Aruna Ramakrishna <aruna.ramakrishna@xxxxxxxxxx>
Cc: x86@xxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")

---
Changes in v4:
- Added Fixes tag

Changes in v3:
- simplify control flow to always enable access to 0 pkey

Changes in v2:
- fixed typos and reworded the comment
---
kernel/rseq.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/kernel/rseq.c b/kernel/rseq.c
index 2cb16091ec0ae..9d9c976d3b78c 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -10,6 +10,7 @@

#include <linux/sched.h>
#include <linux/uaccess.h>
+#include <linux/pkeys.h>
#include <linux/syscalls.h>
#include <linux/rseq.h>
#include <linux/types.h>
@@ -402,11 +403,19 @@ static int rseq_ip_fixup(struct pt_regs *regs)
void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
{
struct task_struct *t = current;
+ pkey_reg_t saved_pkey;
int ret, sig;

if (unlikely(t->flags & PF_EXITING))
return;

+ /*
+ * Enable access to the default (0) pkey in case the thread has
+ * currently disabled access to it and struct rseq/rseq_cs has
+ * 0 pkey assigned (the only supported value for now).
+ */
+ saved_pkey = enable_zero_pkey_val();
+
/*
* regs is NULL if and only if the caller is in a syscall path. Skip
* fixup and leave rseq_cs as is so that rseq_sycall() will detect and
@@ -419,9 +428,11 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
}
if (unlikely(rseq_update_cpu_node_id(t)))
goto error;
+ write_pkey_val(saved_pkey);
return;

error:
+ write_pkey_val(saved_pkey);
sig = ksig ? ksig->sig : 0;
force_sigsegv(sig);
}


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com