Re: [RFC PATCH for 4.17 02/21] rseq: Introduce restartable sequences system call (v12)

From: Peter Zijlstra
Date: Thu Mar 29 2018 - 10:24:18 EST


On Thu, Mar 29, 2018 at 09:54:01AM -0400, Mathieu Desnoyers wrote:
> Let's say we disallow system calls from rseq critical sections. A few points
> arise:
>
> - We still need to allow traps (page faults, breakpoints, ...) within rseq c.s.,
>
> - We still need to allow interrupts within rseq c.s.,

Sure, but all those are different entry points, so that shouldn't be a
problem.

> - We need to decide whether we just document that syscalls within rseq c.s.
> are not supported, or we enforce a behavior if this happens (e.g. SIGSEGV).
> If we enforce a SIGSEGV, we'd have to figure out whether it's worth it to
> add extra branches to the system call fast path to validate this.

Without enforcement someone will eventually do this :/ We might (maybe)
get away with it being a debug option somewhere, but even that sounds
like trouble.

> - We need to carefully consider the case of system calls issued within signal
> handlers nested on top of rseq. When RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL is
> _not_ set, neither in the rseq c.s. descriptor nor in the TLS @flags,
> it's pretty much straightforward: upon signal delivery, the kernel moves the
> ip to abort, and clears the tls @rseq_cs pointer. This means that any system
> call issued within the signal handler is not actually within the rseq c.s.
> upon which the signal is nested.
>
> The case I worry about is if a thread sets the RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
> flag in its TLS @flags field (useful in a debugging scenario where we want a
> debugger to single-step through the rseq c.s. and observe registers at each step).
> Arguably, this is only ever used in development. However, it does allow a situation
> where a system call executed within a signal handler can nest over a rseq c.s..
> So if we choose to be very strict and SIGSEGV any syscall nested over rseq
> c.s., we may very well end up killing the process for no good reason in this
> scenario.

Yes, that needs a little thought; but when we run the signal handler,
the IP would no longer be inside the active RSEQ, right?

> - We need to decide whether all syscalls are disallowed, or if we want to pick
> specific ones (e.g. fork()).

All.