Re: [PATCH 3/4] rseq: Make rseq work with protection keys
From: Dmitry Vyukov
Date: Mon Feb 24 2025 - 08:40:31 EST
On Fri, 21 Feb 2025 at 22:45, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 2/21/25 13:36, Mathieu Desnoyers wrote:
> >>>
> >>
> >> Because the rseq return to userspace handler is called on every return
> >> to userspace after a task is scheduled back after preemption, I am
> >> concerned about the overhead that would be added by a WRPKRU on the
> >> fast-path, given that it acts as as barrier against speculation. Issuing
> >> WRPKRU only after checking that pkey-0 is not accessible appears to be
> >> moving the overhead to a much less common case.
> >
> > Actually, we should distinguish between two accesses here:
> >
> > A) loads/stores from/to struct rseq
> >
> > B) loads from struct rseq_cs (only happens on rseq abort)
> >
> > (A) is a fast-path executed on return to userspace after a preemption.
> > In order to make it fast, we could require that struct rseq is pkey-0
> > and typically skip any WRPKRU for this access when pkey-0 is already
> > accessible. We can add a check on rseq registration to make sure that
> > struct rseq is indeed pkey-0, and reject it with an error if not. This
> > should help make the ABI robust and less error-prone.
> >
> > Now for (B), it's a slow path. When we observe that rseq->rseq_cs is
> > not NULL, we can simply override with a permissive pkey to make sure
> > the rseq_cs access will work.
> >
> > Thoughts ?
> I think this will be the first ABI which is explicitly pkey-0-only. I
> suspect there are a few more of these that are implicit but we just
> haven't found them yet.
>
> I wouldn't have any objections about doing this, especially given
> sanity checking at rseq registration.
Thanks for the thoughtful review.
I've tried to incorporate all suggestions in v4:
https://lore.kernel.org/all/68427864e0ca38af06482c96728216c3e0973418.1740403209.git.dvyukov@xxxxxxxxxx/T/#m431c13b6140e447d41d228fb942d9e4b8a89874a
Yes, the intention for doing the switch only on the error path was to
avoid WRPKRU on the hot path.
For my current use case (at least how I currently have it implemented)
struct rseq and altstack are indeed protected with pkey 0. So I added
a new enable_zero_pkey_val() function that lazily enables only pkey 0.
Also added Fixes tag (for everything except for signal.c refactoring).
The "fixed" commit seems to be the original rseq patch d7822b1e24f2
("rseq: Introduce restartable sequences system call") b/c PKEYs
introduction is older.
I also had to significantly rework the test to make it work with rseq
protected by pkey 0 (which means we need to revoke access to pkey 0,
which means stack/tls/errno also become inaccessible).
Please take another look at v4.
I don't have preference as to how this should get into the Linus tree.
Hopefully the changes are not too pervasive for all subsystems to
cause massive conflicts when merging.