Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

From: Mathieu Desnoyers
Date: Mon Feb 29 2016 - 07:42:08 EST


----- On Feb 29, 2016, at 5:39 AM, Arnd Bergmann arnd@xxxxxxxx wrote:

> On Monday 29 February 2016 11:32:21 Peter Zijlstra wrote:
>> On Sun, Feb 28, 2016 at 12:39:54AM +0000, Mathieu Desnoyers wrote:
>>
>> > /* This structure needs to be aligned cache line size. */
>> > struct thread_local_abi {
>> > int32_t cpu_id;
>> > uint32_t rseq_seqnum;
>> > uint64_t rseq_post_commit_ip;
>> > /* Add new fields at the end. */
>> > } __attribute__((packed));
>>
>> I would really not use packed; that can lead to horrible layout.
>>
>> Suppose someone would add:
>>
>> uint32_t foo;
>> uint64_t bar;
>>
>> With packed, you get an unaligned uint64_t in there, which is horrible.
>> Without packed, you get a hole, which you can later fill.
>

Actually, Peter is wrong about the hole there. On some 32-bit architectures,
64-bit integers are aligned on 32-bit, not 64-bit. So there may or may not
be a hole there, and that would lead to a mess.

> What's making things worse is that on some architectures, adding
> __packed will force access by bytes rather than just reading
> a 32-bit or 64-bit numbers directly, so it's slow and non-atomic.

Agreed that many architectures issue slower instructions when reading
from packed structures, which is unwanted.

Could we require that each field be naturally aligned and require that
they are placed so _no_ padding whatsoever should ever be added by the
compiler ? If that's possible, then we could remove the packed.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com