Re: [RFC PATCH 0/3] Implement getcpu_cache system call

From: Mathieu Desnoyers
Date: Thu Jan 14 2016 - 10:59:05 EST

----- On Jan 12, 2016, at 7:51 PM, Josh Triplett josh@xxxxxxxxxxxxxxxx wrote:

> On January 12, 2016 4:22:29 PM PST, Mathieu Desnoyers
> <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>>----- On Jan 12, 2016, at 4:02 PM, Ben Maurer bmaurer@xxxxxx wrote:
>>>> One idea I have would be to let the kernel reserve some space either
>>after the
>>>> first stack address (for a stack growing down) or at the beginning
>>of the
>>>> allocated TLS area for each thread in copy_thread_tls() by fiddling
>>>> sp or the tls base address when creating a thread.
>>> Could this be implemented by having glibc use a well known symbol
>>name to define
>>> the per-thread TLS area? If an high performance application wants to
>>avoid any
>>> relocations in accessing this variable it would define it and that
>>> would override glibc's. This is how things work with malloc. glibc
>>has a
>>> default malloc implementation but we link jemalloc directly into our
>>> in addition to changing the malloc implementation this means that
>>calls to
>>> malloc don't go through the PLT.
>>Just to make sure I understand your proposal: defining a well known
>>with a weak attribute in glibc (or bionic...), e.g.:
>>int32_t __thread __attribute__((weak)) __getcpu_cache;
>>so that applications which care about bypassing the PLT can override it
>>int32_t __thread __getcpu_cache;
>>glibc/bionic would be responsible for calling the getcpu_cache() system
>>to register/unregister this TLS variable for each thread.
>>One thing I would like to figure out is whether we can use this in a
>>way that
>>would allow introducing getcpu_cache() into applications and libraries
>>(e.g. lttng-ust tracer) before it gets implemented into glibc, in a way
>>would keep forward compatibility for whenever it gets introduced in
>>We can declare __getcpu_cache as a weak symbol in arbitrary libraries,
>>make them register/unregister the cache through the getcpu_cache
>>The main thing that I would need to tweak at the kernel level within
>>system call would be to keep a refcount of the number of times the
>>__getcpu_cache is registered per thread. This would allow multiple
>>one per library (e.g. lttng-ust) and one for glibc, but we would
>>that they all register the exact same address for a given thread.
>>The reference counting trick should also work for cases where
>>define a non-weak __getcpu_cache, and want to call the getcpu_cache
>>system call to register it themselves (before glibc adds support for
> This seems like something better done in a tiny common library, rather than the
> kernel or by playing symbol resolution games.

It does not cost much to recommend a specific symbol name and marking the
symbol as weak in shared libraries. We could then also remove the "unregister"
command, which then means any library registering its cache cannot be unloaded.
This would remove the need to keep track of registration/unregistration with
a reference count within the kernel.

We should then document that a registered cpu_cache should not be freed before
its associated thread exits.

Would it be simple enough, or too simplistic ?



Mathieu Desnoyers
EfficiOS Inc.