Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v10)

From: Florian Weimer
Date: Tue Jun 04 2019 - 07:50:16 EST

* Mathieu Desnoyers:

> ----- On May 31, 2019, at 11:46 AM, Florian Weimer fweimer@xxxxxxxxxx wrote:
>> * Mathieu Desnoyers:
>>> Let's break this down into the various sub-issues involved:
>>> 1) How early do we need to setup rseq ? Should it be setup before:
>>> - LD_PRELOAD .so constructors ?
>>> - Without circular dependency,
>>> - With circular dependency,
>>> - audit libraries initialization ?
>>> - IFUNC resolvers ?
>>> - other callbacks ?
>>> - memory allocator calls ?
>>> We may end up in a situation where we need memory allocation to be setup
>>> in order to initialize TLS before rseq can be registered for the main
>>> thread. I suspect we will end up needing a fallbacks which always work
>>> for the few cases that would try to use rseq too early in dl/libc startup.
>> I think the answer to that depends on whether it's okay to have an
>> observable transition from âno rseq kernel supportâ to âkernel supports
>> rseqâ.
> As far as my own use-cases are concerned, I only care that rseq is initialized
> before LD_PRELOAD .so constructors are executed.

<> is relevant in
this context. It requests the opposite behavior from LD_PRELOAD.

> There appears to be some amount of documented limitations for what can be
> done by the IFUNC resolvers. It might be acceptable to document that rseq
> might not be initialized yet when those are executed.

The only obstacle is that there are so many places where we could put
this information.

> I'd like to hear what others think about whether we should care about IFUNC
> resolvers and audit libraries using restartable sequences TLS ?

In audit libraries (and after dlmopen), the inner libc will have
duplicated TLS values, so it will look as if the TLS area is not active
(but a registration has happened with the kernel). If we move
__rseq_handled into the dynamic linker, its value will be shared along
with with the inner objects. However, the inner libc still has to
ensure that its registration attempt does not succeed because that would
activate the wrong rseq area.

The final remaining case is static dlopen. There is a copy of on
the dynamic side, but it is completely inactive and has never run. I do
not think we need to support that because multi-threading does not work
reliably in this scenario, either. However, we should skip rseq
registration in a nested libc (see the rtld_active function).

>>> 4) Inability to touch a TLS variable (__rseq_abi) from ld-linux-*.so.2
>>> - Should we extend the dynamic linker to allow such TLS variable to be
>>> accessed ? If so, how much effort is required ?
>>> - Can we find an alternative way to initialize rseq early during
>>> dl init stages while still performing the TLS access from a function
>>> implemented within ?
>> This is again related to the answer for (1). There are various hacks we
>> could implement to make the initialization invisible (e.g., computing
>> the address of the variable using the equivalent of dlsym, after loading
>> all the initial objects and before starting relocation). If it's not
>> too hard to add TLS support to, we can consider that as well.
>> (The allocation side should be pretty easy, relocation support it could
>> be more tricky.)
>>> So far, I got rseq to be initialized before LD_PRELOADed library
>>> constructors by doing the initialization in a constructor within
>>> I don't particularly like this approach, because the
>>> constructor order is not guaranteed.
>> Right.
> One question related to use of constructors: AFAIU, if a library depends
> on glibc, ELF guarantees that the glibc constructor will be executed first,
> before the other library.

There are some exceptions, like DT_PREINIT_ARRAY functions and
DF_1_INITFIRST. Some of these mechanisms we use in the implementation
itself, so they are not really usable to end users. Cycles should not
come into play here.

By default, an object that uses the rseq area will have to link against
libc (perhaps indirectly), and therefore the libc constructor runs

> Which leaves us with the execution order of constructors within,
> which is not guaranteed if we just use __attribute__ ((constructor)).
> However, all gcc versions that are required to build recent glibc
> seem to support a constructor with a "priority" value (lower gets
> executed first, and those are executed before constructors without
> priority).

I'm not sure that's the right way to do it. If we want to happen
execution in a specific order, we should write a single constructor
function which is called from _init. For the time being, we can add the
call to an appropriately defined inline function early in _init in
elf/init-first.c (which is shared with Hurd, so Hurd will need some sort
of stub function).