Re: [RFC PATCH 0/3] Implement getcpu_cache system call

From: Josh Triplett
Date: Tue Jan 12 2016 - 20:47:10 EST


On January 12, 2016 4:22:29 PM PST, Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>----- On Jan 12, 2016, at 4:02 PM, Ben Maurer bmaurer@xxxxxx wrote:
>
>>> One idea I have would be to let the kernel reserve some space either
>after the
>>> first stack address (for a stack growing down) or at the beginning
>of the
>>> allocated TLS area for each thread in copy_thread_tls() by fiddling
>with
>>> sp or the tls base address when creating a thread.
>>
>> Could this be implemented by having glibc use a well known symbol
>name to define
>> the per-thread TLS area? If an high performance application wants to
>avoid any
>> relocations in accessing this variable it would define it and that
>definition
>> would override glibc's. This is how things work with malloc. glibc
>has a
>> default malloc implementation but we link jemalloc directly into our
>binaries.
>> in addition to changing the malloc implementation this means that
>calls to
>> malloc don't go through the PLT.
>
>Just to make sure I understand your proposal: defining a well known
>symbol
>with a weak attribute in glibc (or bionic...), e.g.:
>
>int32_t __thread __attribute__((weak)) __getcpu_cache;
>
>so that applications which care about bypassing the PLT can override it
>with:
>
>int32_t __thread __getcpu_cache;
>
>glibc/bionic would be responsible for calling the getcpu_cache() system
>call
>to register/unregister this TLS variable for each thread.
>
>One thing I would like to figure out is whether we can use this in a
>way that
>would allow introducing getcpu_cache() into applications and libraries
>(e.g. lttng-ust tracer) before it gets implemented into glibc, in a way
>that
>would keep forward compatibility for whenever it gets introduced in
>glibc.
>
>We can declare __getcpu_cache as a weak symbol in arbitrary libraries,
>and
>make them register/unregister the cache through the getcpu_cache
>syscall.
>The main thing that I would need to tweak at the kernel level within
>the
>system call would be to keep a refcount of the number of times the
>__getcpu_cache is registered per thread. This would allow multiple
>registrations,
>one per library (e.g. lttng-ust) and one for glibc, but we would
>validate
>that they all register the exact same address for a given thread.
>
>The reference counting trick should also work for cases where
>applications
>define a non-weak __getcpu_cache, and want to call the getcpu_cache
>system call to register it themselves (before glibc adds support for
>it).

This seems like something better done in a tiny common library, rather than the kernel or by playing symbol resolution games.