Re: [PATCH 3/4] rseq: Make rseq work with protection keys

From: Mathieu Desnoyers
Date: Tue Feb 18 2025 - 10:27:42 EST


On 2025-02-18 10:10, Dmitry Vyukov wrote:
On Tue, 18 Feb 2025 at 15:57, Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:

On 2025-02-18 02:55, Dmitry Vyukov wrote:
On Mon, 17 Feb 2025 at 21:21, Mathieu Desnoyers
[...]

we still let this function read the rseq_cs.
+ * It's unclear what benefits the resticted code gets by doing this

restricted

+ * (it probably already hijacked control flow at this point), and
+ * presumably any sane sandbox should prohibit restricted code
+ * from accessing struct rseq, and this is still better than
+ * terminating the app unconditionally (it always has a choice
+ * of not using rseq and pkeys together).

Note that because userspace can complete an rseq critical section
without clearing the rseq_cs pointer, this could happen simply because
the kernel is preempting the task after it has:

1) completed an rseq critical section, without clearing rseq_cs,
2) changed pkey.

So allowing this is important, and I would remove the comment about
hijacked control flow and such. This can happen with normal use of the
ABI.

Thanks for the review!

I've addressed all comments in the series in v2.

I've reworded this paragraph to simplify sentences, but I still kept
the note aboud malicious rseq_cs.

If we would not be circumventing normal protection, then, yes, these
cases would be the same. But since we are circumventing protection
that otherwise exists, I think it's important to think about
potentially malicious cases. In this context inaccessible rseq_cs
values that resulted from normal execution are very different from
malicious onces. Normal ones will point to a fixed set of real
well-formed rseq_cs objects, while malicious ones may point to
god-knows-where in an attempt of an attacker to do things we can't
even imagine right now (e.g. rseq_cs overlapping with protected crypto
keys).

It's as if a particular instance of copy_to_user would allow
user-space to write arbitrary kernel memory, and memory of other
processes circumventing all normal protections. In that context we
would need to be very careful regarding what we actually allow.

I'm considering that we should clear the rseq_cs pointer whenever
userspace issues pkey_mprotect.

This would ensure that no legitimate scenario can trigger a load
from a rseq_cs area which has the wrong pkey, and therefore we
could accept read/write from/to a struct rseq which has the wrong
pkey, but kill the process if trying to read/write from a
struct rseq_cs with the wrong key. This would prevent userspace
from making the kernel read/write from/to memory with the wrong
pkey through a pointer it controls (rseq_cs pointer).

Thoughts ?

I am not following.

There are pkey_mprotect calls, then independently installs on rseq_cs
pointers that happen concurrently and after pkey_mprotect, and
independent set of pkey_set calls that happens concurrently and after
the previous 2.
I don't see how doing something at the pkey_mprotect call for the
single thread avoids any scenarios.

Hrm. Sorry, I mixed up pkey_set() vs pkey_mprotect(). What I had in mind
was actually pkey_set(). And that would need to clear rseq_cs for all
threads belonging to the process, which may not be straightforward
because those could legitimately be inside a rseq critical section.

OK, let's try another approach: rather than kill the process if
read/write of the rseq_cs area with the wrong key fails, could we simply
clear the rseq_cs pointer in that case ? Technically there would be no
legitimate use of this except for the case where it is meant to be lazily
cleared.

Thanks,

Mathieu

Moreover, pkey 0 is preinstalled for all pages, but access to it can
be revoked in future.


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com