Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread
From: Thomas Gleixner
Date: Fri Feb 26 2016 - 11:31:45 EST
On Fri, 26 Feb 2016, Peter Zijlstra wrote:
> On Thu, Feb 25, 2016 at 05:17:51PM +0000, Mathieu Desnoyers wrote:
> > ----- On Feb 25, 2016, at 12:04 PM, Peter Zijlstra peterz@xxxxxxxxxxxxx wrote:
> > > On Thu, Feb 25, 2016 at 04:55:26PM +0000, Mathieu Desnoyers wrote:
> > >> ----- On Feb 25, 2016, at 4:56 AM, Peter Zijlstra peterz@xxxxxxxxxxxxx wrote:
> > >> The restartable sequences are intrinsically designed to work
> > >> on per-cpu data, so they need to fetch the current CPU number
> > >> within the rseq critical section. This is where the getcpu_cache
> > >> system call becomes very useful when combined with rseq:
> > >> getcpu_cache allows reading the current CPU number in a
> > >> fraction of cycle.
> > >
> > > Yes yes, I know how restartable sequences work.
> > >
> > > But what I worry about is that they want a cpu number and a sequence
> > > number, and for performance it would be very good if those live in the
> > > same cacheline.
> > >
> > > That means either getcpu needs to grow a seq number, or restartable
> > > sequences need to _also_ provide the cpu number.
> > If we plan things well, we could have both the cpu number and the
> > seqnum in the same cache line, registered by two different system
> > calls. It's up to user-space to organize those two variables
> > to fit within the same cache-line.
> I feel this is more fragile than needed. Why not do a single systemcall
> that does both?
Right. There is no point in having two calls and two update mechanisms for a
very similar purpose.
So let userspace have one struct where cpu/seq and whatever is required for
rseq is located and flag at register time which parts of the struct need to be