Re: Restartable Sequences system call merged into Linux

From: Mathieu Desnoyers
Date: Tue Jun 12 2018 - 12:31:30 EST

----- On Jun 12, 2018, at 9:11 AM, Florian Weimer fweimer@xxxxxxxxxx wrote:

> On 06/11/2018 10:04 PM, Mathieu Desnoyers wrote:
>> ----- On Jun 11, 2018, at 3:55 PM, Florian Weimer fweimer@xxxxxxxxxx wrote:
>>> On 06/11/2018 09:49 PM, Mathieu Desnoyers wrote:
>>>> It should be noted that there can be only one rseq TLS area registered per
>>>> thread,
>>>> which can then be used by many libraries and by the executable, so this is a
>>>> process-wide (per-thread) resource that we need to manage carefully.
>>> Is it possible to resize the area after thread creation, perhaps even
>>> from other threads?
>> I'm not sure why we would want to resize it. The per-thread area is fixed-size.
>> Its layout is here: include/uapi/linux/rseq.h: struct rseq
> Looks I was mistaken and this is very similar to the robust mutex list.
> Should we treat it the same way? Always allocate it for each new thread
> and register it with the kernel?

That would be an efficient way to do it, indeed. There is very little
performance overhead to have rseq registered for all threads, whether or
not they intend to run rseq critical sections.

>> The ABI is designed so that all users (program and libraries) can interact
>> through this per-thread TLS area.
> Then the user code needs just the address of the structure.


> How much coordination is needed between different users of this
> interface? Looking at the the section hacks, I don't think we want to
> put this into glibc at this stage. It looks more like something for
> which we traditionally require compiler support.

I really don't mind maintaining a separate project containing librseq
along with the headers needed to facilitate declaration of rseq critical
sections. This specifically does not need much coordination between users of
the interface.

The part which really requires coordination between users is registration
to the kernel (and ownership) of the rseq TLS area.

I have a few possible approaches in mind (feel free to suggest other

A) glibc exposes a strong __rseq_abi TLS symbol:

- should ideally *not* be global-dynamic for performance reasons,
- registration to kernel can either be handled explicitly by requiring
application or libraries to call an API, or implicitly at thread
- requires all rseq users to upgrade to newer glibc. Early rseq users
(libs and applications) registering their own rseq TLS will conflict
with newer glibc.

B) exposes a strong __rseq_abi symbol:

- should ideally *not* be global-dynamic for performance reasons, but
testing shows that using initial-exec causes issues in situations where ends up being dlopen'd (e.g. java virtual machine dlopening
the lttng-ust tracer linked against,
- registration/unregistration of area to kernel can either be performed
lazily on first use, destruction done using pthread_key, or require an
explicit API call from application,
- A per-thread refcount in a TLS could allow many users to call the
registration/unregistration API, and lazy registration,
- an early-user application which also exposes a __rseq_abi strong symbol
would conflict with

C) __rseq_abi symbol declared weak within each user (application, librseq,
other libraries, glibc):

- should ideally *not* be global-dynamic for performance reasons,
- however, initial-exec causes issues when librseq or early user libraries
are dlopen'd (e.g. java runtime dlopening lttng-ust),
- a weak symbol allow combining early user libs/apps with glibc/librseq
exposing the same symbol,
- considering that glibc is AFAIK never dlopen'd, does not cause exhaustion
of initial-exec TLS entries in cases where or early adopter
libs are dlopen'd,
- if glibc implicitly registers the rseq area, *and* also wants
to register it, *and* early adopters also want to register it, we should
come up with a refcount scheme in the TLS ensuring that registration and
unregistration is only done with the first/last user comes/goes away.

Thoughts ?



> Thanks,
> Florian

Mathieu Desnoyers
EfficiOS Inc.