Re: [RFC PATCH v6 1/5] Thread-local ABI system call: cache CPU number of running thread
From: Mathieu Desnoyers
Date: Thu Apr 07 2016 - 08:34:50 EST
----- On Apr 7, 2016, at 8:03 AM, Florian Weimer fweimer@xxxxxxxxxx wrote:
> On 04/07/2016 01:19 PM, Peter Zijlstra wrote:
>> On Thu, Apr 07, 2016 at 12:39:21PM +0200, Florian Weimer wrote:
>>> On 04/07/2016 12:31 PM, Peter Zijlstra wrote:
>>>> On Thu, Apr 07, 2016 at 11:01:25AM +0200, Florian Weimer wrote:
>>>>>> Because ideally this structure would be part of the initial (glibc) TCB
>>>>>> with fixed offset etc.
>>>>>
>>>>> This is not possible because we have layering violations and code
>>>>> assumes it knows the precise of the glibc TCB. I think Address
>>>>> Sanitizer is in this category. This means we cannot adjust the TCB size
>>>>> based on the kernel headers used to compile glibc, and there will have
>>>>> to be some indirection.
>>>>
>>>> So with the proposed fixed sized object it would work, right?
>>>
>>> I didn't see a proposal for a fixed size buffer, in the sense that the
>>> size of struct sockaddr_in is fixed.
>>
>> This thing proposed a single 64byte structure (with the possibility of
>> eventually adding more 64byte structures). Basically:
>>
>> struct tlabi {
>> union {
>> __u8[64] __foo;
>> struct {
>> /* fields go here */
>> };
>> };
>> } __aligned__(64);
>
> That's not really âfixed sizeâ as far as an ABI is concerned, due to the
> possibility of future extensions.
Hi Florian,
Let me try to spell out how I'm proposing to combine
fixed-size structure as well as future extensions.
I understand that this trick might be a bit counter-
intuitive.
Initially, we define a fixed size struct tlabi, which
length is 64 bytes. It is zero-padded, and will never be
extended beyond 64 bytes. When we register it to the
system call, we pass value 0 to the tlabi_nr parameter.
So far, the kernel only has to track a single pointer
and a 32-bit features mask per thread.
If we even fill up those 64 bytes, then we need to assign
a new struct tlabi_1. Its length may also be fixed at 64 bytes,
or another size that we can decide at that time. When userspace
will register this new structure, it will pass value 1 as
tlabi_nr parameter. At that point, the kernel will need to
track two pointers per thread, one for tlabi_nr=0 and one for
tlabi_nr=1. However, the kernel can combine the features_mask
of the two tlabi structures internally into a single uint64_t
bitmask per thread, and then we can extend this to a larger
bitmask if we ever have more than 64 features.
Now the question is: Peter Anvin claims that this scheme is
too complex, and we should just have a dynamically-sized
area, which size is the max between the size known by the
kernel and the size known by glibc. I'm trying to figure out
whether we can do that without adding NULL pointer checks, size
checks, and all sorts of extra code to the user-space fast-path,
or else that would mean going down the fixed-size structure route
would be justified.
Hopefully my summary here helps clarifying a few points.
Thanks!
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com