Re: [RFC PATCH 1/2] rseq: Implement KTLS prototype for x86-64
From: Florian Weimer
Date: Mon Sep 28 2020 - 11:14:34 EST
* Mathieu Desnoyers:
> Upstreaming efforts aiming to integrate rseq support into glibc led to
> interesting discussions, where we identified a clear need to extend the
> size of the per-thread structure shared between kernel and user-space
> (struct rseq). This is something that is not possible with the current
> rseq ABI. The fact that the current non-extensible rseq kernel ABI
> would also prevent glibc's ABI to be extended prevents its integration
> into glibc.
>
> Discussions with glibc maintainers led to the following design, which we
> are calling "Kernel Thread Local Storage" or KTLS:
>
> - at glibc library init:
> - glibc queries the size and alignment of the KTLS area supported by the
> kernel,
> - glibc reserves the memory area required by the kernel for main
> thread,
> - glibc registers the offset from thread pointer where the KTLS area
> will be placed for all threads belonging to the threads group which
> are created with clone3 CLONE_RSEQ_KTLS,
> - at nptl thread creation:
> - glibc reserves the memory area required by the kernel,
> - application/libraries can query glibc for the offset/size of the
> KTLS area, and offset from the thread pointer to access that area.
One remaining challenge see is that we want to use vDSO functions to
abstract away the exact layout of the KTLS area. For example, there are
various implementation strategies for getuid optimizations, some of them
exposing a shared struct cred in a thread group, and others not doing
that.
The vDSO has access to the thread pointer because it's ABI (something
that we recently (and quite conveniently) clarified for x86). What it
does not know is the offset of the KTLS area from the thread pointer.
In the original rseq implementation, this offset could vary from thread
to thread in a process, although the submitted glibc implementation did
not use this level of flexibility and the offset is constant. The vDSO
is not relocated by the run-time dynamic loader, so it can't use ELF TLS
data.
Furthermore, not all threads in a thread group may have an associated
KTLS area. In a potential glibc implementation, only the threads
created by pthread_create would have it; threads created directly using
clone would lack it (and would not even run with a correctly set up
userspace TCB).
So we have a bootstrap issue here that needs to be solved, I think.
In most cases, I would not be too eager to bypass the vDSO completely,
and having the kernel expose a data-only interface. I could perhaps
make an exception for the current TID because that's so convenient to
use in mutex implementations, and errno. With the latter, we could
directly expose the vDSO implementation to applications, assuming that
we agree that the vDSO will not fail with ENOSYS to request fallback to
the system call, but will itself perform the system call.
Thanks,
Florian
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill