Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread
From: Mathieu Desnoyers
Date: Thu Feb 25 2016 - 12:18:05 EST
----- On Feb 25, 2016, at 12:04 PM, Peter Zijlstra peterz@xxxxxxxxxxxxx wrote:
> On Thu, Feb 25, 2016 at 04:55:26PM +0000, Mathieu Desnoyers wrote:
>> ----- On Feb 25, 2016, at 4:56 AM, Peter Zijlstra peterz@xxxxxxxxxxxxx wrote:
>> The restartable sequences are intrinsically designed to work
>> on per-cpu data, so they need to fetch the current CPU number
>> within the rseq critical section. This is where the getcpu_cache
>> system call becomes very useful when combined with rseq:
>> getcpu_cache allows reading the current CPU number in a
>> fraction of cycle.
> Yes yes, I know how restartable sequences work.
> But what I worry about is that they want a cpu number and a sequence
> number, and for performance it would be very good if those live in the
> same cacheline.
> That means either getcpu needs to grow a seq number, or restartable
> sequences need to _also_ provide the cpu number.
If we plan things well, we could have both the cpu number and the
seqnum in the same cache line, registered by two different system
calls. It's up to user-space to organize those two variables
to fit within the same cache-line.
getcpu_cache GETCPU_CACHE_SET operation takes the address where
the CPU number should live as input.
rseq system call could do the same for the seqnum address.
The question becomes: how do we introduce this to user-space,
considering that only a single address per thread is allowed
for each of getcpu_cache and rseq ?
If both CPU number and seqnum are centralized in a TLS within
e.g. glibc, that would be OK, but if we intend to allow libraries
or applications to directly register their own getcpu_cache
address and/or rseq, we may end up in situations where we have
to fallback on using two different cache-lines. But how much
should we care about performance in cases where non-generic
libraries directly use those system calls ?