Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

From: Rich Felker
Date: Fri Nov 23 2018 - 13:36:51 EST


On Fri, Nov 23, 2018 at 12:52:21PM -0500, Mathieu Desnoyers wrote:
> ----- On Nov 23, 2018, at 12:30 PM, Rich Felker dalias@xxxxxxxx wrote:
>
> > On Fri, Nov 23, 2018 at 12:05:20PM -0500, Mathieu Desnoyers wrote:
> >> ----- On Nov 23, 2018, at 9:28 AM, Rich Felker dalias@xxxxxxxx wrote:
> >> [...]
> >> >
> >> > Absolutely. As long as it's in libc, implicit destruction will happen.
> >> > Actually I think the glibc code shound unconditionally unregister the
> >> > rseq address at exit (after blocking signals, so no application code
> >> > can run) in case a third-party rseq library was linked and failed to
> >> > do so before thread exit (e.g. due to mismatched ref counts) rather
> >> > than respecting the reference count, since it knows it's the last
> >> > user. This would make potentially-buggy code safer.
> >>
> >> OK, let me go ahead with a few ideas/questions along that path.
> > ^^^^^^^^^^^^^^^
> >>
> >> Let's say our stated goal is to let the "exit" system call from the
> >> glibc thread exit path perform rseq unregistration (without explicit
> >> unregistration beforehand). Let's look at what we need.
> >
> > This is not "along that path". The above-quoted text is not about
> > assuming it's safe to make SYS_exit without unregistering the rseq
> > object, but rather about glibc being able to perform the
> > rseq-unregister syscall without caring about reference counts, since
> > it knows no other code that might depend on rseq can run after it.
>
> When saying "along that path", what I mean is: if we go in that direction,
> then we should look into going all the way there, and rely on thread
> exit to implicitly unregister the TLS area.
>
> Do you see any reason for doing an explicit unregistration at thread
> exit rather than simply rely on the exit system call ?

Whether this is needed is an implementation detail of glibc that
should be permitted to vary between versions. Unless glibc wants to
promise that it would become a public guarantee, it's not part of the
discussion around the API/ABI. Only part of the discussion around
implementation internals of the glibc rseq stuff.

Of course I may be biased thinking application code should not assume
this since it's not true on musl -- for detached threads, the thread
frees its own stack before exiting (and thus has to unregister
set_tid_address and set_robustlist before exiting).

> >> First, we need the TLS area to be valid until the exit system call
> >> is invoked by the thread. If glibc defines __rseq_abi as a weak symbol,
> >> I'm not entirely sure we can guarantee the IE model if another library
> >> gets its own global-dynamic weak symbol elected at execution time. Would
> >> it be better to switch to a "strong" symbol for the glibc __rseq_abi
> >> rather than weak ?
> >
> > This doesn't help; still whichever comes first in link order would
> > override. Either way __rseq_abi would be in static TLS, though,
> > because any dynamically-loaded library is necessarily loaded after
> > libc, which is loaded at initial exec time.
>
> OK, AFAIU so you argue for leaving the __rseq_abi symbol "weak". Just making
> sure I correctly understand your position.

I don't think it matters, and I don't think making it weak is
meaningful or useful (weak in a shared library is largely meaningless)
but maybe I'm missing something here.

> Something can be technically correct based on the current implementation,
> but fragile with respect to future changes. We need to carefully distinguish
> between the two when exposing ABIs.

Yes.

> >> There has been presumptions about signals being blocked when the thread
> >> exits throughout this email thread. Out of curiosity, what code is
> >> responsible for disabling signals in this situation ?
>
> This question is still open.

I can't find it -- maybe it's not done in glibc. It is in musl, and I
assumed glibc would also do it, because otherwise it's possible to see
some inconsistent states from signal handlers. Maybe these are all
undefined due to AS-unsafety of pthread_exit, but I think you can
construct examples where something could be observably wrong without
breaking any rules.

> > Related to this,
> >> is it valid to access a IE model TLS variable from a signal handler at
> >> _any_ point where the signal handler nests over thread's execution ?
> >> This includes early start and just before invoking the exit system call.
> >
> > It should be valid to access *any* TLS object like this, but the
> > standards don't cover it well. Right now access to dynamic TLS from
> > signal handlers is unsafe in glibc, but static is safe.
>
> Which is a shame for the lttng-ust tracer, which needs global-dynamic
> TLS variables so it can be dlopen'd, but aims at allowing tracing from
> signal handlers. It looks like due to limitations of global-dynamic
> TLS, tracing from instrumented signal handlers with lttng-ust tracepoints
> could crash the process if the signal handler nests early at thread start
> or late before thread exit. One way out of this would be to ensure signals
> are blocked at thread start/exit, but I can't find the code responsible for
> doing this within glibc.

Just blocking at start/exit won't solve the problem because
global-dynamic TLS in glibc involves dynamic allocation, which is hard
to make AS-safe and of course can fail, leaving no way to make forward
progress.

Rich