Re: [RFC][PATCH] spin loop arch primitives for busy waiting

From: Nicholas Piggin
Date: Fri Apr 07 2017 - 07:26:55 EST


On Fri, 7 Apr 2017 11:43:49 +0200
Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> On Thu, Apr 06, 2017 at 10:31:46AM -0700, Linus Torvalds wrote:
> > But maybe "monitor" is really cheap. I suspect it's microcoded,
> > though, which implies "no".
>
> On my IVB-EP (will also try on something newer):
>
> MONITOR ~332 cycles
> MWAIT ~224 cycles (C0, explicitly invalidated MONITOR)
>
> So yes, expensive.

Interestingly, Intel optimization manual says:

The latency of PAUSE instruction in prior generation microarchitecture
is about 10 cycles, whereas on Skylake microarchitecture it has been
extended to as many as 140 cycles.

In another part this is claimed for efficiency improvement. Still much
cheaper than your monitor+mwait on your IVB but if skylake is a bit
faster it might become worth it.