Re: [PATCH 3/4] rseq: Make rseq work with protection keys

From: Mathieu Desnoyers
Date: Fri Feb 21 2025 - 16:11:27 EST


On 2025-02-21 15:50, Dave Hansen wrote:
On 2/21/25 12:05, Mathieu Desnoyers wrote:
On 2025-02-21 14:48, Dave Hansen wrote:
On 2/21/25 11:38, Mathieu Desnoyers wrote:
I agree that switching to permissive key in the fast path would be
simpler. AFAIU, the switch_to_permissive_pkey_reg() is only a pkey
read when the key is already permissive.

Unfortunately, on x86, PKRU is almost never in its permissive state. We
chose a policy (stored in the global init_pkru_value variable) that
allows R/W access to pkey 0, but disables access to everything else.
It's 0xfffffff5, IIRC.

This ensures deny-by-default behavior and ensures that threads cloned
off long ago don't have a dangerous PKRU value for newly-allocated and
pkey-protected memory.

If I had a time machine, it'd be interesting to go back and try to make
PKRU's default value be all 0's and also represent the logically most
restrictive value.

Can we assume (or require) that struct rseq and struct rseq_cs reside in
pkey-0 memory ?

Maybe. Signal stacks are _practically_ only able to use pkey-0. You can
technically protect them with anything you want and then WRPKRU as the
first instruction once you hop into the signal handler (since
instruction fetches aren't affected by x86 pkeys), but I seriously doubt
anybody would go to the trouble.

And that would not work on arm64, AFAIU arm64 POR_EL0 also applies to
instruction fetches, which somewhat prevents what can be done for signal
handlers if the code intends to be portable.


In that case, we could add something to the pkey API that switches to a
permissive state only if pkey 0 cannot be accessed.

Therefore it would only trigger a pkey read in the common case, and
issue a pkey write only if pkey 0 is not accessible.
I think that's a sane policy. An rseq access can happen at any time
(from the app's perspective) so the access would theoretically be done
with a random PKRU value from a random point in the thread's lifetime.

But it is a different policy that we've chosen with signals and "remote"
accesses, which is to just ignore pkeys entirely.

I don't have a strong opinion. It's hard to balance performance and
consistency with the other ABI here.

Because the rseq return to userspace handler is called on every return
to userspace after a task is scheduled back after preemption, I am
concerned about the overhead that would be added by a WRPKRU on the
fast-path, given that it acts as as barrier against speculation. Issuing
WRPKRU only after checking that pkey-0 is not accessible appears to be
moving the overhead to a much less common case.

And perhaps if we end up observing that for some reasons either the
sigframe and/or "remote" pkey accesses really must use pkey-0 as well
to work in real-life, then we could make them require pkey-0. That's
of course assuming it would cause no observable ABI breakage.
Once advantage here would be to speed up signal handler delivery.

I have no clue what a "remote" pkey access is. Is this the io_uring
use-case ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com