RE: [RFC PATCH 0/3] Implement getcpu_cache system call

From: Ben Maurer
Date: Tue Jan 12 2016 - 16:03:29 EST

> One idea I have would be to let the kernel reserve some space either after the
> first stack address (for a stack growing down) or at the beginning of the
> allocated TLS area for each thread in copy_thread_tls() by fiddling with
> sp or the tls base address when creating a thread.

Could this be implemented by having glibc use a well known symbol name to define the per-thread TLS area? If an high performance application wants to avoid any relocations in accessing this variable it would define it and that definition would override glibc's. This is how things work with malloc. glibc has a default malloc implementation but we link jemalloc directly into our binaries. in addition to changing the malloc implementation this means that calls to malloc don't go through the PLT.