Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

From: H. Peter Anvin
Date: Sat Feb 27 2016 - 10:07:57 EST


On February 27, 2016 6:15:01 AM PST, Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>----- On Feb 27, 2016, at 1:24 AM, H. Peter Anvin hpa@xxxxxxxxx wrote:
>
>> On 02/26/16 16:40, Mathieu Desnoyers wrote:
>>>>
>>>> I think it would be a good idea to make this a general pointer for
>the kernel to
>>>> be able to write per thread state to user space, which obviously
>can't be done
>>>> with the vDSO.
>>>>
>>>> This means the libc per thread startup should query the kernel for
>the size of
>>>> this structure and allocate thread local data accordingly. We can
>then grow
>>>> this structure if needed without making the ABI even more complex.
>>>>
>>>> This is more than a system call: this is an entirely new way for
>userspace to
>>>> interact with the kernel. Therefore we should make it a general
>facility.
>>>
>>> I'm really glad to see I'm not the only one seeing potential for
>>> genericity here. :-) This is exactly what I had in mind
>>> last year when proposing the thread_local_abi() system call:
>>> a generic way to register an extensible per-thread data structure
>>> so the kernel can communicate with user-space and vice-versa.
>>>
>>> Rather than having the libc query the kernel for size of the
>structure,
>>> I would recommend that libc tells the kernel the size of the
>thread-local
>>> ABI structure it supports. The idea here is that both the kernel and
>libc
>>> need to know about the fields in that structure to allow a two-way
>>> interaction. Fields known only by either the kernel or userspace
>>> are useless for a given thread anyway. This way, libc could
>statically
>>> define the structure.
>>
>> Big fat NOPE there. Why? Because it means that EVERY interaction
>with
>> this memory, no matter how critical, needs to be conditionalized.
>> Furthermore, userspace != libc. Applications or higher-layer
>libraries
>> might have more information than the running libc about additional
>> fields, but with your proposal libc would gate them.
>
>Good point!
>
>>
>> As far as the kernel providing the size in the structure (alone) -- I
>> *really* hope you can see what is wrong with that!! That doesn't
>mean
>> we can't provide it in the structure as well, and that too might
>avoid
>> the skipped libc problem.
>
>Indeed, libc would need to query the size before it can allocate
>the structure.
>
>>
>>> I would be tempted to also add "features" flags, so both user-space
>>> and the kernel could tell each other what they support: user-space
>>> would announce the set of features it supports, and it could also
>>> query the kernel for the set of supported features. One simple
>approach
>>> would be to use a uint64_t as type for those feature flags, and
>>> reserve the last bit for extending to future flags if we ever have
>>> more than 64.
>>>
>>> Thoughts ?
>>
>> It doesn't seem like it would hurt, although the size of the flags
>field
>> could end up being an issue.
>
>I'm concerned that this thread-local ABI structure may become messy.
>Let's just imagine how we would first introduce a "cpu_id" field
>(int32_t),
>and eventually add a "seqnum" field for rseq in the future (unsigned
>long).
>
>Both fields need to be read with single-copy semantics as volatile
>reads, and both need to be naturally aligned. However, I'm tempted
>to use the "packed" attribute on the structure since it's an ABI
>between kernel and user-space. A pretty bad example of what this
>could become, due to alignment constraints, looks like:
>
>/* This structure needs to be aligned on pointer size. */
>struct thread_local_abi {
> int32_t cpu_id;
> int32_t __unused1;
> unsigned long seqnum;
> /* Add new fields at the end. */
>} __attribute__((packed));
>
>And this is just a start. It may become messier as we append
>new fields in the future.
>
>The main argument I currently see in favor of having this
>meta system call for all per-thread features is to only
>maintain a single pointer in the kernel task_struct rather
>than one per thread-local feature.
>
>If the goal is really to keep the burden on the task struct
>small, we could use kmalloc()/kfree() to allocate and free an
>array of pointers to the various per-thread features, rather
>than putting them directly in task_struct. We could keep a
>mask of the enabled features in the task struct too (which
>we will likely have to do even if we go the the thread-local
>ABI meta system call).
>
>Having this per-task allocated pointer array at kernel-level
>would allow us to have one system call per feature, with clear
>semantics, without evolving a messy thread-local ABI structure
>due to all sorts of alignment constraints.
>
>Thoughts ?
>
>Thanks,
>
>Mathieu

I think you are worried about problems which we have already solved many, many times - structures are very common in the user space ABI and we know how to deal with this.

And when you say:

> However, I'm tempted
> to use the "packed" attribute on the structure
> since it's an ABI
> between kernel and user-space.

and mention "unsigned long" in a user space ABI all I can think of that you really have not followed the issues of user space ABI design as they have evolved over the last 20 years.

Simply put: non-problem.
--
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.